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Abstract 

This article investigates parameter estimation of affine term structure models by means of the gen¬ 
eralized method of moments. Exact moments of the affine latent process as well as of the yields are 
obtained by using results derived for p—polynomial processes. Then the generalized method of mo¬ 
ments, combined with Quasi-Bayesian methods, is used to get reliable parameter estimates and to 
perform inference. After a simulation study, the estimation procedure is applied to empirical interest 
rate data. 
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1 Introduction 


This article i s concerned with para meter estimation and inference in affine term structure models. We 


use results of 


Cuchiero et al. 


(j2012l l on p—polynomial processes to obtain the exact conditional moments 


of a latent affine process driving the term structure. By assuming a stationary affine process, we obtain 
not only the exact moments of a vector of yields with various maturities but also the first-order auto¬ 
covariance matrices of the yields and the squared yields. Then we estimate th e model parameters by 
means of the Generalized Method of Moments (GMM) introduced in[ 


methods (see 


Chernozhukov and Hond . 


HansenI (jl982l l , where Quasi-Bayesian 


2 OO 3 I I are used to minimize the GMM distance function. A further 


contribution of this paper is a rigorous study on testing market price of risk specifications discussed in 
quantitative finance literature. By considering the Wald test, we observe that test statistics obtained 
from output provided by Quasi-Bayesian methods strongly outperform test statistics which are obtained 
by standard procedures with respect to power and size. 

Affine term structure models have their origin in the univariate models of 


Vasicek 

(1977 

) and 

Cox et al. 


(jl985l ). The performance of thes e models and simila r univariate setups were already investigated for ex¬ 


ample in 


Ait-Sahalial (jl996al l and 


Ait-Sahalial (jl996bl b The articles show that these univariate pa r ametri c 


models inadequate 


alternative, 


A'it-Sahalia (1996b) as well as 


y describe t he inter e st rat e dynamics. Based on this finding 


Dai and SingletonI (2000) and 


Stanton 


Ait-Sahalial ( 1996al ). 


99 7 I) proposed non ¬ 


par a metric interest rates models. As an 


Dai and SingletonI (|2003l ) favored multivariate settings to cir¬ 


cumvent the shortcomings of univariate models. This alternative modeling approach has the advantage 
that a mathematical framework, where bonds and derivatives can be priced in a straightforward way, is 
available. 

Let us briefly discu ss som e litera ture on the performance of different estimation approaches: Regarding 


parameter estimation. 


Zhoul (200l|) studied the efficient method of moments [EMM), the GMM, the 


quasi-max i mum likelihood estimation {QMLE) and the maximum likelihood estimation (MLE) for the 


Cox et al 


()l985l ) model. In his study the author assumes that the instantaneous interest rate, driven by 


a square root process, can be observed. The most efficient results are observed for the MLE, which is 
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followed by the QMLE and the E MM 


size is sufficiently large. In addition, 


Regar ding the GMM, this method performs well if the sample 


Zhoul (|2003l l constructed a GMM estimator by deriving moments for 


univariate latent processes by applying Ito’s formula (under the same assumption that the instantaneous 
interest rate can be observed). This estimator has been compared to the ML estimator. In contrast to 


Zhoul (j200ll ). in this setup the GMM estimator performs quite well in the finite sample compared to the 


maximum likelihood estimation. 

More recent literature has proposed different frequentist and Bayesian approaches to estimate the 


recently in 


Chib and Ergashev 

05 

0 

0 

CNI 

), an earlier application is e.g. 

Friihwirth-Schnatter and Gever 

(1996) 


Regarding Bayesian estimation methods, 


.lonesi (|2nn.'ll l pointed out that strong priors are necessary to 


estimate the parameters in the case of a low degree of mean reversion (i.e., high persistence) of the 


terminology of 


Dai and Sineleton. 

2000 

)by 

Hamilton and Wu 

(20121 


Ad ditional articles on parameter e s timation for affine m odel s are e.g. 


(1201 ih . 


Alt-Saha 


vided in 


Piazzesi 


ia and Kimmell ((20101), 


Egorov et al 


(I 2 OI 1 I I and 


Diebold et al 


Joslin et al 


((2003) 


Duffee 


((20101). An overview is pro- 


((20101) . A further approach is to approximate the transition density of the affine process 


via approximations of the Chapman/K olmogorov forward equation. This approac 


series of papers by Ai't-Sahalia (see, e.g^ 


((201j) used the moments obtained in 


Ait-Sahalia. 


Cuchiero et al 


2002; 


1 has b een explored in 


Ait-Sahalia and Kimmel 


2 OIOI I. 


Filipovic et al. 


()2012l l to construct additional likelihood expansions. 


In contrast to a lot of other approaches already used in the literature, we use the exact moments of 
the yields observed, arising from a multivariate affine term structure model. Neither an approximation 
of the moments (such as an approximation via the solution of the stochastic differential equation) nor 
an approximation of the likelihood is required. Since we have to minimize a GMM distance function 
in more than twenty parameters, G MM estimation is nontrivial. To account for this problem, we use 
Quasi-Bayesian methods developed in IChernozhukov and Bond ((2003). As standard errors of parameter 


estimates are byproducts of this estimation routine, we apply them in parameter testing, where we observe 


^For stochastic volatility models lAndersen et al.l lll999ll have shown that the EMM estimator has almost the same efficiency 
as the maximum likelihood estimator. 
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rejection rates of the true null hypothesis to be close to the theoretical significance levels. By contrast, 
when using standard routines to estimate the asymptotic covariance matrix of the unknown parameter 
vector, the performance of the Wald test, measured in terms of power and size, is very poor. 

This paper is organized as follows: Section [2] introduces affine term structure models. Section [3] applies 
results obtained in mathematical hnance literature to calculate the moments of the latent process driving 
the yields and then derives the moments of the yields observed. Section 0] describes the small sample 
properties of the GMM estimator, while Section [5] applies the estimator to empirical data. Finally, 
Section [ 6 ] offers conclusions. 


2 AfRne Models 


This section provides a brief description of affine models, which is mainly based on 


Filipovid ((20091). 


Consider the state space =5^ = x M"' C where m,n > 0, m + n = d, and the filtered probability 
space (12,-F, (J^dt> 0 )lP)- With X(t) S the stochastic process in continuous time (X(t))j>o is generated 
by the following affine stochastic differential equation 


dX(t) = (b^ + /3^X(t)) dt + p(X{t))dW^(t) , (1) 

where is a d—dimensional vector and and p(x) are d x d matrices. The d x d diffusion term 
a(x) is defined such that a(x) = p(x)p(x)' = a + Yli=i where a and ctj, i = 1, ..., d, are d x d 

matrices. W^(t) is a d—dimensional standard Brownian motion. For more details the reader is referred to 
Appendix O In an affine environment the instantaneous interest rate (short rate, r{t) £ follows from 

?’(i) = 70 + 7xX(t) , (2) 

where 70 is a scalar and 7 ^ is a d—dimensional vector. We consider an arbitrage free market, where P is 
the empirical measure and Q is an equivalent martingale measure. We assume that the process ((X.{t))t>o 
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is affine also in the measure Q, such that 


hX(t) = (b^ + (3^y.{t)) dt + p(X{t))dW^(t) , 


( 3 ) 


where {t) is a d—dimensional standard Brownian motion under Q measure. 

By equations m and ©, the stochastic process (X(t))^>Q is affine in both measures. While the 
diffusion parameters (a, Qj, i = l,...,d) remain the same under both measures, we have to consider 
parameters , b® and in both measures P and Q. This specification, namely equations ([T]) 


and m, is cal 
is provided in 


ed the extended affine market price of risk speeification, and its mathematical foundation 


Cheridito et al 


((20071). These authors also show by means of the Girsanov theorem that 


W^{t) = W^{t) + /q 0(X(s))ds. For the affine class 


0(X(t)) = (p (X(t)))-' (b^ - bQ + (/3^ - 13^) X(t)) , 


( 4 ) 


where (?!)(X(t)) G The stochastic process ((/)(X(t)))j>Q, is called market price of risk process. 


Remark 1. To observe how the market price of risk process ((^(X(t)))^>Q is connected to risk premia, 
Cochrand (j2005l l [ d. 339] provides a formal relationship between the process ((/)(X(t)))^>Q and the (instan¬ 
taneous) Sharpe ratio. 


We also assume that the process (X(t)) satisfies the admissibility cone 
which ensure that the process (X(t)) does not leave the state space =5^ (see 


itions (under b oth measures), 


Filipovid . 


20091 . Theorem 10.2 


and Appendix (Eli. Next, we define the index sets / = {!,... ,m} and J = {m + 1,... ,n}, where m + n = d. 
Let hj = {bi,..., bm)' and fBu = This notation, the admissibility restrictions (see Appendix lE)) . 

the short-rate model m and the condition E (exp(—fjr(z)dz)j < -|-oo, for some r G M+, imply that 


^In this article we apply the following notation: For vectors and matrices we use boldface. If not otherwise stated, the 
vectors considered are column vectors. Given a vm x cm matrix M, the term stands for “from row Va to row rj , 

and from column Ca to column ct of matrix M”. The abbreviation stands for “from row Va to row rb of matrix M”, 

while stands for all columns, i.e. columns 1 to cm- In addition, Mr„;ri,.ca extracts the elements Va to rb of the column 
Ca- In addition, Pij stands for and stand for a x 6 matrices of zeros and ones; Oa and Ga is used to abbreviate 

Oaxi and e^xi; la is the a x a identity matrix, while I(.) stands for an indicator function. Given a vector x G R", diag{x.) 
transforms x into a n x n diagonal matrix. 2 E-3 stands for 2 • 10“® = 0.002. 
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there exists a unique solution ($(t, u), u)')^ £ C x of the system of Riccati differential equations 


dt^{t,n) 


i (^j(t,u))'ajj’Jfj(t,u) + (b' 3 )'^^(t,u) -70; ^>(0,u) = 0 , 

i -7^j; for i £ / , 

--fxj; ^(0,u) = u , 


(5) 


where t £ [0,r] , u £ and jS = (/3i,... ,f3d), with /3j being a d—dimensional vector, i = 1, ... ,d (see 

. 4)0 


Filip ovid . 


20091 . Theorem 10.4) o This system of ordinary differential equations is used to calculate the 
time t price of a zero coupon bond, 7r®(t, r), with time to m aturity r . The arbitrage free zero coupon 


model prices 7r*^(t,r) and the model yields y^{t,T) follow from 


Filipovid (2009|) [Corollary 10.2]. That is 


7r°(t, r) = exp (<h(r, 0) + T'(r, O)'X(t)) and 

= -^log (7r°(t,r)) =(4>(r,0) + T'(r,0)'X(t)) . 


( 6 ) 


The time to maturity, r, and u = 0 are the arguments of the functions ‘h(t,u) and u) described in 
([5]). The parameters under Q have to be used to derive <h(r, 0) and T' (r,0). 

3 Moments and Polynomial Processes 

Since the goal of this paper is to estimate the model parameters by means of the GMM, we have to 
obtain the moments of the yields. Section 1,3.11 uses a recent theory for polynomial processes to obtain a 
closed form expression for the moments of the latent process (X(t))t>o. In Section [3.21 we derive the exact 
moments for the model yields of an affine term structure model with diagonal diffusion term. Finally, 
Section 13.31 deals with the case of empirical data, when the number of yields observed is larger than the 
dimension of (X(t))t>o and thus the yields observed cannot be matched exactly with the model yields 
derived in (161) . 


®Ordinary differential equations similar to (O have already been investigated in iDuffie and (Il996l l and iDuffie et all 


ood). 
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3.1 Polynomial Processes 


Based on the results of 


Cuchiero et al. 


((2013) on p—polynomial Markov processes, this subsection derives 


the conditional moments of the latent process (X.{t))t>o- Let us consider a time homogeneous Markov 
processes (X(t))i>o, started at X(0) = x G =5^, where the state space ^ is a closed subset of The 
semigroup {^t)t>o described by 


= E(/(X(t))|X(0) = x) = / dC) 

jy 


(7) 


is defined on all integrable functions f\ with respect to the Markov kernels t't(x, •). For an affine 

term structure model we need moments of (X(f)) for a process “started” at X(s) = x] t > s. Given 
the filtration {J^t)t>o and the assumption that (X(t)) is a homogeneous Markov process, the conditional 
expectation of ffX ft)), when the process is started at X(s) = x, is given by E(/(X(t))|X(s) = x) = 


(see, e.g.. 


Klenke . 


20081 . Theorem 17.9). 


Next, let V<p{S^) be the finite dimensional vector space of polynomials on ^ up to degree p > 0, i.e. 


v<p{y) = 


U=o 






where x^ = , 


,(k) ' 


and dk = 


k -\- d — 1 


• ( 8 ) 


(k) h (k) 

For i = 1,... ,dk and j = 1,... ,d the exponents li- in the expression for x^ satisfy li- g Nq as well as 
Etif = ^ 0 In affine term structure models the basis of V<p{^) is given by (l,x', (x^)',..., (x^)')' 
and thus its dimension is X = Yl^=odk- In addition, the Markov process (X.{t))t>s with X(s) = x G ^ 
is called p-polynomial if for all /(x) G V<p{^) and t>s 


= E(/(X(t))|X(s) = x) G V<pi^). 


(9) 


^For example, for d = 3 and A: = 2 we have the following: ,*1X2,a:ia:3,*21 *2*3,®3)\ ^2 = 6 and thus (i) 

= 2 ,zg) = 0 ,/^) = 0 , (ii) = l,zg^ = 0 . (hi) zg^ = l.zg) = 0,zf3) = 1 , (iv) zg^ = 0 . z'^^^ = = 0 , (v) 


= 0, m = 1, l),t’ = L (vi) = 0. Z^^^ = 0, = 2. 


;( 2 ) 


7(2) 


( 2 ) 


7(2) 


7(2) 
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hat is to say, if /(x) is polynomial, then the E(/(X(t))|X(s) = x) is polynomial as well. ICnchiero et al 


(j2012l l [Theorem 2.7] have shown that a time homogeneous Markov processes (X(t)) is p—polynomial if 


and only if there exists a linear map A on V<p{S^) such that I^t-s restricted on V<p can be written as 
^t-s\v<p = exp((t - s)A)0 Equipped with this mathematical tool and by means of d?]), the conditional 
expectation E(/(X(t))|X(s) = x), for t > s and /(x) G V<p{S^) can be derived by means of 


IE(/(X(t))|X(s) = x) = exp((t - s)A)/(x) 


( 10 ) 


The conditional expectations of /(X(t)) given X(s) = x, can be deriv ed by obtaining the N x N matrix 


A, where N = Y%=i ^k-, from the generator (see 


Cuchiero et al 


2012I . Theorem 2.9) 


9/(*) ,1 V' , , M aV(x) 


e/(x) = + 


i=l 

d 


ui=i 

d 


E {'■f + ^ ) E 


dxidxj 

a2/(x) 


(11) 


2=1 


*j=i 




To obtain the moments of (X(t)) we set /(X(t)) = [X(t)^]^ for /c = 1,... ,p and z = 1,..., d^. As already 
stated above, if the dimension of X(t) is larger than one, then X(t)^ = , • • • , 

where G No, X)j=i ^ = k < p, i = 1,... and j = 1,..., d. In more detail, we consider the basis 
(ei, ..., zn) = (l,x', (x^)',..., (xP)'). By applying the extended generator Q to the basis element Cj, we 
get the z-th row of the N x N matrix A by means of 



N 

Qzi = y^^AjjZj. 
i=i 


( 12 ) 


The left hand side has been calculated by applying (|lip to the corresponding basis element. Then Aij 


®Note that = exp((t — s)A) also solves the Kolmoeorov backward equation — QuU — s,x), where Q is 

an ex tended generator as described in ICuchiero et all (1201211 [Definition 2.3]. This follows from the proof of ICuchiero et all 
ll2012h [Th eorem 2.7]. 
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follows from (I12p simply by comparing coefficients. This finally results in 


E(X(t)'=|X(s) =x) = (0^^^^,-i^^,Irf^,0^^^^_^,^^^Jexp((t-s)A)(l,x',(x2)',...,(xP)')' ,(13) 


where 1^^, is the x identity matrix and t > s. 


3.2 iDai and SingletonI (l2000h -Models and Moments of the Latent Process 


To proceed with an identifi ed model, we work with affine models where the diffusion term can be diago- 

tionl^ 


nalized. For this sub-class, 


Dai and SingletonI ([20001) provided sufficient conditions for identification^ In 


this case, the affine process (X(t))f>o follows the stochastic differential equation 


dX(t) = (bQ + (3^X{t))dt + X VS(X(t))dW^(t), where 

5,,(X(t))=^o + (Bf)'X(t), 5,,(X(t)) = 0, fori,j = l,...,d, i^j , (14) 

and X = diag (Si,..., S^) such that Sj = [S]^^ > 0. 


Equation (fTTll is a special case of dSj). The elements of the d—dimensional vector are B^. B^ is a 
dx d matrix, where d x 1 vector Bf is the i-th column of this matrix; i.e., B^ = {Bf,...,B^) with 
Bf = {Bfj^,... ,B^-y, i = l,...,d. Since X and S(X(t)) are diagonal matrices we obtain a(X(t)) = 
X^S(X(t)) = X^diag [B^ + (B^)'X(t)y The diagonal elements of the d x d diagonal matrix a are given 
by an = ^fB^, i = 1,..., d and the diag onal elements of the d x d diagonal matrices ctj, i = 1,..., d, are 
^ 2 ^f 2 , ■ ■ ■, ^'d^id- B^ Dai and Singleton (2000) require 


(3^ 


(3^j Omxn 


and B^ 


Bfj > 0 


(15) 


^Fo r example, the Ai(3) model, which will be presented in equation (I18II . has 19 parameter under Q. iDai and Singlet^ 
ll2000l ~) hav e shown that the same te rm structure can be obtained with different parameters. I.e. the model is not identified. 
Given the iDai and Singlet^ (l2000l i con ditions for iden t ificati on, only 14 parameters are allowed to be free parameters. 
Regarding the diagonal diffusion matrix. Icheridito et all ll2008h (Theorem 2.1] provide conditions where a transformation of 
a general affine model m to an affine model with diagonal a(x) exists. For d < 3 this is always the case. 
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considered in the form — 'K{t))dt and thus In the following, 0^ = — (/3^) ^ 

is a vector of dimension d, partitioned into 0^ and , where the first term is of dimension m while the 
second term is of dimension n; i.e., 0f G M”*, 0j G M”, and thus 0*^ = •> ^ ^ The same 

partition is applied also to X(t). This yields to the following. 


Definition 1 (|Dai and SingletonI (|200Cll l- canomca/ representation of an Amid) model). Consider (14) 


with diagonal diffusion matrix and the short-rate model Admissibility and identification require the 
following: 

(i)-(a) For m > 0 is j3^ of structure given by m, where in addition > 0 for 1 < j < m and i j. 
Furthermore, 0^^ > 0, = 0 and pfj0f < 0. 


(i)-(b) For m = 0 is /3^ a lower (or upper) triangular matrix Wa,i a,nd Singleton. 


2000. p. 1948). 


iff) S = Irf. 

(hi) 7 o and jxi are unrestricted for i ^ I, while jxj > 0 for j G J. 

(iv) = (Oixmjeixn)^ and is of structure provided by m- 
If the admissibility conditions (i)-(iv) for the affine process (X(t))f>o are satisfied, then model with 
diagonal diffusion term will be called Amff) model. 


Definition [T](i)- (a) implies that bf = — YlJLi Pfjdf > 0 , for i = 1 ,..., m, and thus the first m elements 
of b*5 are strictly positive and the last n elements of b'^ are negative. Namely 


bO 


b 


Q 


b« 


-f3fj0f > 0 \ 

-PjiOf <0 ) 


(16) 


This implies that the diagonal elements of flu are negative. We slightly deviate from the canonical 

representation in Definition [1] by assuming X to be a diagonal matrix with entries S, > 0 and 7 ^ = 

^Note that the canonical representation of lPai and Single ton| l|20 nfl|l is one o f many representations where the admissibility 
and identification conditions are met. The Appendix of IPai and SingletonI (1200(11 presents affine linear transformations 
AyiX(t) = LyiX(t) + 1 a where the model is still admissible and identified. 
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Since 0j is restricted to zero, not all elements of (3^ and b® can be unrestricted. In the estimation 
procedure we account for this fact by using 6^ as a parameter. Then b® = —(3^6^. 


Now we apply the tools develop e d in Section 13.11 to A^(d 


first derive matrix A for the 


VasicekI ( 19771 1 and the 


Cox et al. 


models. To observe how this works we 


(jl98fil l model. Then we calculate A for an 


Am{d) model for arbitrary 0 < m < d and d <3. Matrix A, for d = 3, is presented in Appendix[Bl as for 
p = 4 moments its dimens ion becomes la rge (35 x 35). 


Let us start with the 


VasicekI (|l977l l model, where d = 1 and m = 0 such that {X{t)) follows an 


Ornstein-Uhlbeck process dX{t) = {b^ + j3^X{t))dt + S dW^{t). For this model the generator of Markov- 
transition probabilities Q is given by 


gf{x) = {b^ + /3^x) 


dfjx) 1 ^ 2 ^V(^) 
dx 2 dx^ 


(17) 


Consider the basis 1, x, x^,..., x^. The linear map A used to derive the moments < p (under P) is given 
by the (p + 1) x (p + 1) matrix 


/ 


A = 


0 




bp 


0 ... 



26^ 

2/3^ 

0 

0 

3S2 

Sb^ 

3/3-^ 0 

0 


0 

kb^ kl3^ 

0 



0 


pb^ PP^ y 


For the ICox et al.l (|l985l l model, where d = 1 and m = 1, {X{t)) follows a square-root process dX{t) = 


{b^ + f3^X{t))dt + X{t)dW^{t). The generator of Markov-transition probabilities G is given by 


Gfix) = [b^ + 13^ x) 


dfjx) 1^2 d^fjx) 

dx 2 dx^ 
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such that the linear map A is given by the (p + 1) x (p + 1) matrix 


A = 


0 

0... 

0 2b^ + 2j3^ 0 

0 0 3b^ + 3S2 3/3^ 


V 0 


0 kb^ + 


0 pb^ + 


p/3^ y 


where \ < k < p. For an Ai(3) model, where d = 3 and m = 1, (X(t)) follows a stochastic process 
containing one square root component. Let us start with the model under Q 


dX(t) = 


( ( b^ = > 0 ^ 

b^ = < 0 


V < 0 ) 


+ 


^ < 0 0 " ^ 

/^ 2 ® 1>0 / 32®2 
y /3® > 0 /3^ /3^ j 


0 


\ 


X(t) 


dt 


+ 


/ 


V 






S3Vl + ^f3^l(t) / 


dW^{t). 


(18) 


The 


Dai and Singleton ( 2000 1 restrictions discussed above yield: > 0 and /3^ < 0, 3^2, Bf^ > 0, and 


Si,S 2 ,S 3 > 0. Note that (IlSp has 13 parameters while under Q we can identify 14 parameters. These 

m 


parameters are the thirteen param eters in (|18D and 70 arising in 


F. Based on 


Cheridito et al 


The same structure is assumed under 


(|2007l l this extended affine market price of risk specification is mathematically 


well defined given that bf = 6 f > 0 , bj = {b^, b^)' < 0 , and eight additional parameters ^ 0 ) P 21 — 0 ) 
> 0, (322, 13^21 f^ 2 ‘Si /^ 33 ! contained in and > 0 contained in 0^, where O 2 = 0^ = 0. Then 


®In more detail: f3^ (7 parameters), df (1 parameter; which is > 0 while 8^ = df = 0, and thus = 

— 1 ^ 21 ^ ^ (3 parameters, only the elements in the main diagonal are positive, the other parameters are zero), 

Bf 2 > 0 and Bfa > 0. 
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= —(3^6^. Since = 0^^ = O 2 for the Ai(3) model considered, we write 9^ and 9^ instead for 9f 
and 9^ in the following. By collecting these parameters (not subject to an equality restriction), we obtain 
the vector of model parameters 'i5ai(3 ) ^ 

By means of (I18h and the extended affine market price of risk assumption the generator becomes 


g/(x) = ^ (6f + ^ ^ E 


2=1 


2=1 


dxi 


(19) 


The conditional expectation E(/(X(t))|X(s) = x) for /(x) S V<p{S^) follows from Section 13.11 In 
particular, the conditional moments E(X(t)^|X(s) = x), t > s, can be derived by means of m, where 
A is a matrix of dimension N x N. We shall consider the first four moments, i.e., p = 4. The number of 
moments, N, follows from the multinomial coefficients. Regarding the basis elements tj, j = 1,... ,N, of 
our polynomial, we choose the basis (l | xi, a;2) ics | xf, ..., j , ..., X3 | ,..., x|). In this expression we 

have separated the terms of different power by means of |. Matrix A is derived by comparing coefficients, 
such that Gcj = ^ji^u foi' J = 1) • • • where Aji = [AJ^-^. With (X(t)) of dimension 3, we get one 
term for fc = 0, three terms for k = 1, six for k = 2, ten for k = 3 and fifteen for k = A. Therefore N = 35. 
Restricting the corresponding model parameters provides us with the matrix A for an Ai(3) model. 

In the remaining part of this article we stick to following assumption. 


Assumption 1. The background driving process (X(t)) is stationary. 


Sufficient conditions for a stationary process (X(t)) are provided in 


Glasserman and KimI ([20101). 


For Amid) models, when d < 3, sufficient conditions for a stationary process are also reported in 
Ait-Sahalia and Kimmell (|2010l l and in Appendix [El 

For a stationary (X(t)), we get E(X(t)) = 6^. In addition, to obtain higher order moments, we 
use the following abbreviations: x = (1, (x^)', (x^)',..., (x^*)')', which is of dimension N, while X 2 -.n = 
((x^)', (x^)',..., {xPyy is a (A — 1)—dimensional vector. X(t) and X(t )2 :Ar are defined in the same way. 
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Since E = E ^E(X(t) |X(s))^, for 0 < s < i, by the tower, rule we obtain 


EX 


V 


1 


E X(t)2:iV 


= E ([exp((t — s)A)] X(t)) = [exp((t — s)A)]E (x 


1 OlxAT-l 

[exp((t - s)A)] 2 ,jv,i [exp((t - s)A)] 2 ,^ 2 :iV 


V 


E X(t)2:iV 


(20) 


/ 


where the N x N matrix exp((t — s)A) can be partitioned into four blocks; (i) north¬ 
western [exp((t — s)A)]^^ = 1 , (ii) north-eastern [exp((t — s)Ah^ 2 Ar = OixAf-ij (hi) south-western 
[exp((t — s)A)] 2 .jy and (iv) south-eastern [exp((t — s)A)] 2 .jy 2 Arld Hence, the (unconditional) moments 
of order 1 to p follow from 


E 


(^X(t)2:Ar) = (ljV-1 - [exp((t - s)A)] 2 ,jv, 2 :iv) [exp((t - s)A)] 2 ,jv ,1 • 


( 21 ) 


3.3 Moments of the Observed Yields 

The previous Sectionprovided us with the moments of the latent process (X(t)). By means of (l 6 |) the 
model yields are 


(d>(r,0)-h T'(r,0)'X(t)) . 

Now we have to account for the fact that real world data cannot be observed on a continuous time scale, 
but only on a discrete grid A, 2A,... ,tA,... ,TA, where T is the time series dimension and A is the 
step-width. We set A = 1 and assume that Xt stands for X(tA). Additionally, the maturities r available 
are given by r = (ti, ... , where M is the number of maturities observed. For model yields with a 

maturity Tj G {ti, ... jTm} observed at t = tA we use the notation i = 1,... ,M. Since M yields 

®Note that exp((t — s)A) and A are of the same structure. This follows from the power series representation of the matrix 
exponential exp((t — s)A) = v ((^ “ In addition, the existence of — [exp((t — s)A)] 2 .^ 2 jv) follows 

from the properties of the matrix exponential. 
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cannot be matched exactly hy d < M factors, we add the noise term Sti and arrive at the yields observed 


Vti = yti+ eti =-—(^(Ti, 0 ) + ^'(ri, 0 )'Xt)+eti, i = t = 


With M maturities r = (ri,... ,tm) we dehne 



-4>(ri,0)/Ti ^ 


^ -^'( ti , 0 ) 7 ti ^ 


( e \ 

^tl 


# = 


e , # = 


e and £t = 


e 




-^(rM,0)7TM ) 


^tM j 



such that the M—dimensional vector of yields, yt = (yti, • • •, VtM)', is given by 

yt = $ + #Xt+eteM^. (22) 


Based on (I22p we observe that the moments of yti have to follow from the moments of Xt. For the noise 
term £ti we apply the following assumption. 

Assumption 2. Let Sti, t = 1,... , T, z = 1,... ,M, be independent with zero mean, variance 0 < u? < 
+00 and ]E(e^j) < +oo. In addition |E(e^^)| < +oo for i = 1,..., M and = 0 for z = 1,... , [p/2j, 

where [p/2j is the largest integer smaller or equal to pj'l. 


Note that by Assumption [2] all maturities are assumed to be observed with noise. In addition, E(e^jetj) = 0 
for i ^ j, i, j = 1,..., M and E(e^J < +oo. By means of equation ([22]) and Assumption [2] we derive the 
moments of the empirical yields E(y^-yF) = E(([$ +'i'Xt+ et]i)^([^ ++ s:t]j)0) where 0 <k + l <p 
and [-jj extracts the z-th element of a vector. Hence, we derive the first four moments of the yields 
observed, i.e. E(y^J, A: = 1,...,4. In addition, applications in hnance often take the auto-covariance 


of the yields, E(ytiyt-ij); and the auto-covariance of the squ ared yiel c 


(“indicator for volatility clustering” - see, e.g., the discussion in 


Piazzesil 


s, E( z/£z/j 1 ,), into consideration 


2010|)[p. 649]). Therefore also the 


terms E(ytiyt-ii) and E (yti^t-iz) calculated. Since this part is straightforward, but tedious algebraic 
manipulations were necessary to obtain all these moments, we present the results in Appendix O We put 
the noise parameters necessary to obtain the moments of the observed yields into the parameter vector ■ 
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The dimension of i^o- depends on how a‘f is specified and on the moments used in the estimation. If u? is 
different for each maturity, we have M parameters for the second moments of the noise. If, in addition, the 
fourth moments of the yields are calculated, the fourth moments of the noise enter into the calculations as 
well, i.e. we get another M parameters for the moments of the noise. In this case the dimension of 'd^j is 
2M. Since the dimension of the model parameter is already over twenty, we continue with a more 

parsimonious specification of the noise, where af = cr^ and lE(eti) = for all i = 1,... ,M. Hence, the 
dimension of is two if fourth moments are required in the calculation of the yields observed, otherwise 
it is one. This results in the model parameter vec tor '& = ('d'f , of di mension p, which is contained 
in the parameter space 0 £ M^, where due to the lDai and SingletonI (|2000l l and stationarity restrictions. 


0 is proper subset of . The components of are introduced by the first column of Table [TJ 

The calculation of the moments also requires to solve the Riccati equations (|29l) . For th e Vasicek 


and t he Cox-Ingersol-Ross model closed form solutions are available, as e.g. presented in 


Filip ovic 


((20091) [Chapter 10.3.2]. For Am{d) models, however, <I> and T' have to be derived by means of nu- 


merical tools in gen eral 


10 


In this paper we follow a computationally efficient way proposed by 


Grasselli and Tebaldil (|2008l i. to obtain an (almost) closed form solution for < I >(f, u) and u). This 


methodology requires the matrix to be diagonal. Given 


Dai and SingletonI ((20001) setup, this implies 


no further restrictions for m < 1, while for m > 2 the off-diagonal parts of /3^ have to be set to zero. 
Appendix [Dl shows how <1> and T' could be derived for an Am(d) model with diagonal Pjj in a numerically 
parsimonious way. 

4 Parameter Estimation and Finite Sample Properties 


4.1 Parameter Estimation 


By observing yields for maturities r*, i = 1,..., M, in periods t = 1,..., T, we obtain M—variate vectors 
yt = {Vtii ■ ■ ■ jUxm )'1 t = l,...,r, the observations of M—variate time series yi:r = {y'l, ■ ■ ■ iY't)'^ 
well as q-dimensional vectors m(t) (ypr) = {vti, • • •, ■ ■ ■ > riiT (ypr) = 

1 1 P 1 1 2 2 

T ^t=i y^ii - ■ ■ IT ^t=i J/tM’ z^t=2 ytiyt-1,1, ■ ■ ■, z^t=2 ytMVt-iM 


^'’See also lDufRe and lll996i) : [Dai and SingletorJ (I 2 OOOI '): Ichen and Joslirj (l2012il . 
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Let /i(i9)= (E(i/ti),... ,E(yP^),E(ytiyt-i,i), • • • ,E(y,V?/t-i,M))' stand for the corresponding vector 
of moments as a function of the unknown parameter vector £ 0 C M*’. The components of the vector 
/i(i?) are provided in Appendix O fsee equations ([5^ . pSjl . (15^ . (pUjl . IHT]) . ()^ and (liSIl i. 

The generalized method of moments demands for q > p moments to be selected. By means of a q x q 
selector matrix Ad, where = 1 if the corresponding moment is used and zero otherwise, we obtain 

/x(i9) = M. (yi:'r) = Ad lii(t) (YIit) £ E'' and mr (yi:^) = Ad hit (yi:T) £ E''. Next 

we define h(t) (i^iyiir) = ni(t)(yi:T) — and hr {'d]'yi-,T) = mT(yi:r) — as well as the GMM 
distance function 

yi:T) = hT('i?; yiir)^ Ct hT('i?; yiir)- (23) 


The GMM estimate (of i?) mini mizes Qt(-) i n (j23h . where Ct is a q x q symmetric positive semi- 


definite weighting matrix (see, e.g., 


Ruud 


2000 . Chapters 21-22). In particular, the continuous up¬ 


dating estimator (GUE) is used to obtain an efficient GMM estimate. That is, we run an iter¬ 
ative procedure with iteration steps m = 1 ,...,M, where we commute between (i) augmenting the 
parameter-estimate to based on Qt{') given Gt and (ii) updating Ct given from the pre¬ 
vious iteration step m — 1. The weighting matrix applied is Ct = ^At('i 9^‘”~^^)^ , with At = 

T^ Ylt =2 ^a(t) Viit) hf-t-^ yi:T)^ For regularit y conditions and furth e r issu e s on GMM 


estima t ion see, e.g., 


(12003); 


Hansen (19821: 


Guggenberger and Smith 


(12003); 


tonii and Segal (19961: 


Potscher and Pruchal (ll997l h 


Windmeiier 


Newev and Windmeiieil (l2009|). 


To satisfy the order condition, the inequality “q > p” has to be fulfilled. For the Ai(3) model considered 
in Section 131 the dimension of the parameter vector i? is 23 (p = 23), if moments of order smaller than four 
are used. Including fourth order moments of the yields results in p = 24. The number of maturities M 
available is around ten. Therefore, by using the moments K{yti), E(y^j) and E{yuyt-i,i) for z = 1 ,..., M, 
we are already equipped with 3M moment conditions. Hence, for M > 8 the order condition q > p is 
already met. By using the first four moments {p = 4) and the auto-covariances (for M = 10), the number 
of moments is much larger than the number of parameters. 

To obtain parameter estimates, a high-dimensional nonlinear minimization problem has to be solved 
and q moment conditions have to be selected from the set of moments available. Regarding the latter 


17 










































issue, it turned out that the instability of the parameter estimates is amplified if higher order moments 
are added. Due to this instability, using the Wald and the distance difference t ests to test for redundant 
moment conditions (testing for over-identifying restrictions; see, e.g.. 


Ruud 


2000, Chapter 22.2) provide us 


with very ambiguous results. Hence, the selection of these moments was performed by means of simulation 
experiments. Based on the simulation results, we work with q = 27 moment conditions, namely, E(yti), 
f = 1,... ,M = 10, and [E(yty')]ij, for (i,j) = (1,1), (2,2), (3,2), (5,5), (7,7), (9,10) and 

( 10 , 10 ). 

Regarding the minimization of the GMM distance function, we observe that standard minimization 
procedures designed to find local minima do not result in reliable parameter estimates. In more detail, to 
investigate the properties of our estimation routine we performed Monte Carlo experiments with simulated 
yields where M = 10, T = 500 and the number of simulation runs is 1,000. The parameter vector used 
to generate the yields is presented in the second column of Table [TJ The initial values for the GMM 
estimation, are generated as follows: = [i5]j + foi' coordinate j, when the support 

is the real axis, while = exp (logdi^lJj + c^Cj) sgn ([r^Jj) is used for the elements j living on the 

non-positive or non-negative part of the real axis. Cj is Hd standard normal and distortion parameter 
is set to 0, 0.1, 0.25, 0.5 and 1. Then, parameter estimates are obtained by means of the MATLAB 


11 


With this algorithm an estimate 


minimization routine f minsearch based on the Nelder-Mead algorithm . 

R is provided by where - in this case — M is the last iteration step. We observe that the parameters 
can be estimated easily by means of this standard minimization tool when < 0.25; i.e., when the 
optimization is started sufficiently close to the true parameter R. However, the parameter estimation with 
c,^ = 0.5 or Ci 9 = 1 becomes a difficult problem]^ 


To cope wi th this problem, we combine multistart random searc . 


ods (see, e.g., 


Torn and Zilinskas . 


1989 


Chernozhukov and Hond . 


1 met hods with Quasi-Bayesian meth- 


2003|). For each Monte Carlo run i, 


where i = 1,... ,L = 200, we proceed as follows: First, parameter estimation is started with the random 
draws where n = 1,... ,N = 2,000. The samples are generated in the same way as in the 


See http : //www.mathworks.de/de/help/matlab/ref /fmins earch.html _ 

combining multistart random search methods (see, e.g.. lTorn and Zilinskasl . Il989h with the Nelder-Mead algorithm, 
we observe that the parameter estimates improve. However, performing inference still remains a difficult problem. For more 
details see Appendix m 
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above paragraph with distortion parameter = 1. Then with the smallest GMM distance function 
is used as the starting value of the Quasi-Bayesian sampler. Appendix |F] describes how the draws, 
from an ergodic Markov Chain are obtained ^ Finally, parameter estimates 'di as well as the estimates 

I, i = 1,... , p, are derived from the se draws, where the latter are obtained 


of the variance Ybm 


■di 


by applying a batch mean estimator (see 


Flegal and Jones . 


2 OIC 1 . in particular, Equation ( 6 )). 


Tables [T] and [2] present results from our Monte Carlo experiments. In both tables the true parameter 
vector 'd is provided in the second column. In Table[T]the data are generated such that 9^ = 1.5 7 ^ 10 = 9^, 
while 9^ = 9^ = 1.5 in Table[2j In all Monte Carlo experiments an unrestricted model is estimated. That 
is, we obtain separate estimates for 9^ and 9^, respectively. We force our multistart random search 


routine to generate samples such that [9^) = [9^) as well as [9^) 7 ^ [9^) (for bot 

pres ented in Tables [Hand resp ectively). In addition, a reversible jump move, based on 


1 exper i ment s 


and 


CreenI (jl995l l 


Richardson and CreenI (jl997l i is included in the Bayesian sampler. The reversible jump move turned 


out to be useful in the case when 9^ = 9^ (see Appendix |F] for more details). 

From estimates i = 1,... ,L = 200, we obtain the sample mean, median, minimum (min), maxi¬ 
mum (max), standard deviation (std), skewness {skew) and kurtosis {kurt). These descriptive statistics 
are reported in columns three to nine of Tables [T] and [ 2 j The last column presents the absolute difference 
between the sample mean of the estimates and the true parameter value. 

Comparing results based on Quasi-Bayesian methods (see Tabled]) for the case when 9^ 7 ^ 9^ to results 
based on a standard minimization procedure (see Table [5] in Supplementary Material El), we see that the 
Quasi-Bayesian approach reduces the standard deviations of the point estimates for most parameters. 
For example, the standard deviation of the point estimate of 9^ is reduced from 6.05 (see Table [5|) to 
approximately 3.05 (see Table [T|)- Similar effects are observed for the estimates of the terms driving 
volatility, i.e.. Si, S 2 , S 3 and Ug, which are difficult to estimate. By considering the smallest and the 
largest point estimates {min and max in the corresponding tables), we observe a substantially smaller 
dispersion in the point estimates of for the Quasi-Bayesian approach. Note that an estimate of 9^ is an 
estimate of the expected value of the first component of the process (X(t))j>g. Since the serial correlation 
of (Xt)^gj^jj is quite high, we know from estimating means of an autoregressive process, that the standard 
our analysis m = 1, 2,..., M = 20, 000. 
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error of the estimator of the mean becomes large (e.g., when the Fisher-information matrix of an AR{1) 
process is calculated). Similar results are presented in Table [2] for the 9^ = 9^ case. 



mean 

d 

median 

min 

max 

std 

skew 

kurt 

- -i?'! 

~9^ 

10 

8.8660 

8.5486 

0.1534 

19.2247 

3.5008 

0.6071 

4.2157 

1.1340 

9^ 

1.5 

1.6610 

1.4883 

0.0042 

2.5920 

1.0643 

0.4911 

-0.3916 

0.1610 

/sli 

-1 

-1.6418 

-1.2797 

-9.2212 

-0.4173 

1.5798 

-3.2923 

11.8217 

0.6418 

A, 

0.2 

0.1817 

0.1524 

0.0025 

0.3591 

0.1299 

1.8759 

6.5029 

0.0183 

qQ 

P31 

0.02 

0.0350 

0.0214 

1.86E-5 

0.3473 

0.0457 

3.1329 

14.2229 

0.0150 


-1 

-1.4731 

-1.0671 

-8.1154 

-0.4823 

1.1478 

-2.7690 

10.1519 

0.4731 

& 

0.04 

0.0373 

0.0219 

-0.0662 

0.2711 

0.0606 

2.3781 

10.2813 

0.0027 


0 

0.0006 

-0.0003 

-0.0840 

0.0266 

0.0176 

1.5436 

16.5725 

0.0006 

& 

-0.8 

-1.5327 

-1.2070 

-7.8466 

-0.6308 

1.2389 

-2.5704 

8.1375 

0.7327 


-1 

-1.5069 

-0.9650 

-7.0168 

-0.1670 

1.4929 

-1.5812 

1.8702 

0.5069 


0.02 

0.0288 

0.0037 

3.67E-6 

0.0170 

0.0778 

5.4759 

35.4115 

0.0088 


0.01 

0.0099 

0.0032 

4.44E-7 

0.0006 

0.0206 

5.0431 

32.4652 

0.0001 


-0.7 

-1.1194 

-0.6085 

-7.5792 

-0.1400 

1.2938 

-2.2389 

6.0933 

0.4194 


0.01 

-1.1194 

-0.6085 

-7.5792 

-0.1400 

1.2938 

-2.2389 

6.0933 

1.1294 


0 

-0.0015 

0.0000 

-0.0551 

0.0017 

0.0104 

-1.1382 

8.4369 

0.0015 


-0.7 

-0.9059 

-0.4692 

-6.5051 

-0.1844 

1.1881 

-2.7918 

7.8669 

0.2059 

jOX 

^12 

0.1 

0.0623 

0.0123 

2.43E-6 

0.0493 

0.1652 

5.6178 

35.3695 

0.0377 

t7X 

^13 

0.01 

0.1045 

0.0352 

8.08E-7 

0.8676 

0.1856 

3.3315 

13.2906 

0.0945 

70 

2 

1.7855 

1.8939 

-0.0070 

3.2115 

0.8411 

-0.0384 

-0.0520 

0.2145 

Si 

0.7 

0.5921 

0.5238 

0.2002 

1.3639 

0.3121 

0.9334 

0.4182 

0.1079 

S2 

1 

0.4704 

0.3714 

0.1060 

0.9983 

0.3336 

1.3636 

1.4000 

0.5296 

S3 

0.8 

0.4563 

0.3447 

0.1071 

1.0514 

0.3451 

1.4573 

1.8141 

0.3437 


0.0067 

0.0113 

0.0096 

0.0053 

0.0176 

0.0047 

0.7672 

-0.5302 

0.0046 


Table 1: Parameter estimates for the Ai(3) based on Quasi-Bayesian methods. Data simulated with M = 10, T = 500 and 
qQ^qP = 1 is controlling for the noise in the generation of the starting value of the optimization routine. Statistics are 
obtained from L = 200 simulation runs, mean, median, min, max, std, skew and hurt stand for the sample mean, median, 
minimum, maximum, standard deviation, skewness and kurtosis of the point estimates £ = 1,.. . ,L. Ii? — i?! stands for 
absolute value of the mean deviation from the true parameter. The true parameter values •& are reported in the second 
column. 


4.2 Inference 


The asymptotic distribution of y/T ~ normal distribution with mean vect or Op a n d the 


asymptotic covariance matrix V (for more details and regularity conditions see, e.g., 


Hansen 


19821 : 
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Potscher and Pnichal. 1 19971 : iNewev and McFaddenl . 1 1994: IPmidl. 1200(1) . As our test statistics rely on 
asymptotic results, we have to investigate the hnite sample properties of our tests. Since a lot of parame¬ 
ters are considered and various restrictions can be constructed, we focus now on the restriction 6^ = 6^, 
which is often discussed in hnance literature. 

To test for parameter restrictions, we assume that the null hypothesis consists of tp restrictions. 
Suppose that these restrictions are described by a twice continuously differential function r('i9) : IRf —?■ 
and the tp x p matrix of partial derivatives 


^ 9ri('i9) 


R = D^r(i9) = 




d^i 

d'dp 

drtp {§) 

drtp (S) 

\ 

d'Sp 


(24) 


which has rank tp. Under the null hypothesis we have r(i?) = and thus the Wald-statistic becomes 


W = rr('i9)' RVtR' r(^?) , 


-1 


(25) 


where is an estimate of the asymptotic covariance matrix of VT{'d — i?). Under the null hypothesis 
the Wald-statistic W follows a x^-distribution with tp degrees of freedom. The null hypothesis is rejected 
if > Xrp 1 - 05 ) where as is the signihcance level and Xtp i-as percentile of a x^-distribution 

with tp degrees of freedom. In particular, if the goal is to test the null hypothesis 6^ = 6^ against the 
alternative 9^ / 0^, then tp = 1, r('i9) = (1, —1, 0,... ,0) iD = 0^ — 6^ and R = (1, —1, 0,... ,0) H 

Appendix [F] demonstrates that the performance of the Wald test implemented in a standard way 
(as well as the distance difference test) is poor. In particular with = 1, substantial undersizing is 
observed for the Wald test while the power is very low. With the distance difference test we observe 
only minor oversizing, and even if it’s power is already better than the power of the Wald test, is is still 
low (approximately 55% rejection rate on a 5% significance level) 1^ To implement a “standard” Wald 
or distance difference test, the p x p covariance matrix V is estimated by means of the “standard GMM 


^^The components of the parameter vector i? are presented in the first column of Table [T] 
^®We used here the same simulation designs as in Tables [T] and 0 
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covariance matrix estimate” (see, e.g., [Ruud!,[2000|, Chapters 21 and 22, for a “standard” implementation 
of the Wald and the distance difference test). That is, when the following estimate is applied 


Vt = 
Ht = 

At’ = 


) , where 


(^i9;yi:T) G and 

^ t=2 

1 ^ ' 

ry ^ 


qxq 


(26) 


Note that in (j26p matrices of dimension p x p (with p > 23) have to be inverted and partial derivatives 
in matrix yi:r^ have to be derived numerically. Hence, estimating the covariance matrix V 

by means of (I26p is numerically demanding. Additionally, Ht as well as At also depend on yi;T! and 
therefore are subject to the variation of the finite samples. 

To cope with this problem, we use the output of the Bayesian sampler to perform inference. Based on 
Chernozhukov and Hond (|2003l b asymptotic normality still holds and the draws from an ergodic Markov 
Chain, can be used to estimate the covariance matrix V. In particular, to estimate the asymptotic 
variance of 6^ — 9^ = (1, —1,0 ,..., 0 ) -d, we use Markov-Chain Monte Carlo output and the batch mean 


estimator (see 


Flegal and Jones . 


20101 . Equation (6)). For the Wald test, rejection rates of the true and the 


false null-hypothesis are provided in Table[3j We observe that the rejection rates of the true null-hypothesis 
are quite close to their theoretical values as- 


5 Parameter Estimation in Empirical Data 

This section applies the estimator developed in the previous sections to empirical data. We downloaded 


H-15 interest rate data from the Federal Reserve 


16 


In particular, we used weekly data (measured every 


Friday) of “Treasury constant maturity” yields. The time period considered is August 3, 2001 to August 30, 
2013. An almost full panel of maturities from one month to thirty years is available for these periods. Since 
the thirty year maturity time series exhibits a lot of missing values this maturity has been excluded. Thus, 


°http://federalreserve.gov/releases/hl5/data.htm 
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we have M = 10 maturities such that r = {1/12,1/4,1/2,1,2,3,5,7,10,20} and T = 631 observations 
per yield. Although the H-15 data s et can only be seen as a p roxy for the risk-free term structure, we 


follow the related literature (see, e.g., 


Chib and Ergashevl . 


20091 1 and work with this dataset. 


In contrast to the analysis in Section^ where L draws from the data generating process were considered, 
this section investigates one panel of interest rate data. The purpose of running the GMM estimation 
procedure L—times with the same data, is to check for the stability of our estimation routine in the 


empirical data 




±V_bm 
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By doing this, we observe that in all simulation runs, i = 1,... ,L = 5, the intervals 
0.5 


, 7 = 1,... ,p, overlap. Without the Quasi-Bayesian algorithm, this stability result 
would not have been attained. In addition, in all simulation runs the p-values for the test 9^ = 9^ against 
the two-sided alternative 9^ ^ 9^ are smaller than 0.05. Hence, we reject the null hypothesis 9^ = 9^ at 
the significance level as = 0.05. 

To obtain parameter estimates, the draws of the Bayesian sampler m = 5,001,... , 20, 000, 

are used from which we obtain the sample mean i? and the vector of sample standard devia- 

/ 


tions 


Vbm (-d 


1 0.5 


J11 


VsM ( 


1 0.5 


jpp 


, where again the batch mean estimator 


Flegal and Jones 


(j2010l i [Equation 6] is applied. In contrast to Tables [T] and [2 where the descriptive statistics based 
on the various point estimates are presented, we now obtain the median^, sample minimum, min.^, 
sample maximum, fnax.^, sample standard deviation, std^, sample skewness, skew^ and sample kurtosis, 
kurt^ from the draws of one particular chain ijW, m = 5, 001,... , 20,000. These descriptive statistics are 
presented in Table 01 


Following mathematical finance literature (see, e.g., 


Cheridito et al.. 

2007; 

Cochrane. 

2005) 


way to investigate how the market demands for a compensation (risk premium) for the risk generated 
by W^(t), is to consider the market price of risk process ((/)(X(t)))j>Q described in (0|). This process 
depends on the model parameters i?. If b'^ = b® and f3^ = (3^, then (^(X(t)) = 0^. In terms of the 
parametrization used in this article, (/)(X(t)) = 0^ if 9^ = 9^ and = /3^, while if 9^ / 9^ or /3^ / /3®, 
then 0(X(t)) 7 ^ (almost surely). In the following we test whether this is the case. 

By considering the estimates 9^ = 12.0667 and 9^ = 0.0682 and their estimated standard deviations 


^For the mulitstart random search, the vector of parameters presented in the second row of Table [T] is used. 
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V_bm = 2.0573 and Ybm ~ 0.0728, respectively, we observe that the difference in the 

parameter estimates is relatively large, compared to their estimated standard deviations. We obtained 
the Wald statistic W = 4.32546 with p-value being 0.03056. Based on this, the null hypothesis 6^ = 9^ 
is rejected at the signihcance level as = 0.05 for this empirical data set. 

Next, we perform the test (3^ = (3^ against the alternative (3^ / /3^, where /3' contains 
seven parameters. In more details, we test the null hypothesis 

= (/3n,/3fi,/3fi,/3f2>/532>/3f3>/533) against the two sided alternative (/^n,/3^i,/3^i,/3^2>/5?2>/5?3>/^^s) 
7^ (/^U;/^3i!/^32 5/^Is) • estimating f^ 22 ^ ^ 2 ‘i^ ~ 

(/3fi ■, P 211 P 221 Pz 2 '! P 2 'ii and its covariance matrix from Monte Carlo output, we obtain the 

Wald statistic W = 38.7047 with a corresponding p-value of 2.223 E-6. That is, also the null hypoth¬ 
esis [P% P 21 , P% P 22 ^ P%, P%, P^'^ = (/3fi,/3fi,/3fi,/3^,/3f2,/3^,/3^) is rejected on significance levels 
as > 0.01. Summing up, since the null hypothesis 6^ = 9^ and (3^ = (3^ are rejected, the market price 
of risk process is significantly different from zero. 


6 Conclusions 


In this article we developed a new method allowing for parameter estimation b ased on the exa c t mo 
ments of the yields for affine term structure models. By applying the results of 


Cuchiero et al. 


(I 2 OI 2 I I 


on p—polynomial processes the conditional moments are derived. By assuming a stationary process, we 
obtain the exact moments of the yields as well as the hrst order auto-covariance of the yields and the 
squared yields. By means of these moments, the model parameters can be estimated by the generalized 
method of moments. 

Since the number of parameters is relatively large and the moments are non-linear in the model 
parameters, the implementation of the generalized method of moments becomes a non-trivial problem. 
We observe that standard minimization routines perform poorly. To cope with this problem, we use 
random search methods combined with Quasi-B ayesian methods to minimize the GMM distance function 
as proposed in IChernozhukov and Hond (120031 1 . By these techniques parameter estimation becomes more 


stable. The standard deviations as well as the dispersions of the point estimates decrease for most 
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parameters, compared to parameter estimation based on a standard minimization of the GMM distance 
function. For some parameters this decline is substantial. 

Another main contribution of this article is a rigorous investigation of the testing problem, whether 
parameters controlling for the mean of the latent afhne process in the empirical and in the equivalent 
martingale measure are different. We observe substantial undersizing, when implementing a Wald test 
based on stan dard estimates of the covarianc e matrix of the unknown parameter. By applying methods 
developed bv IChernozhukov and Bond (12003), the standard errors of the corresponding components of 


the parameter vector can be obtained from the draws provided by a Bayesian sampler. We observe that 
in this case the rejection rates of the true null hypothesis are close to theoretically correct levels. 

In a hnal step, our estimation methodology is applied to empirical term structure data. By applying 
the testing procedure developed in this article, the null hypothesis of equal parameters controlling for 
the mean of the latent affine process, in the empirical as well as in the equivalent martingale measure, is 
rejected. Our estimates support a signihcant market price of risk. 
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mean 

d 

median 

min 

max 

std 

skew 

kurt 



1.5 

1.7127 

1.2500 

0.0148 

5.4034 

1.5225 

2.4322 

6.6231 

0.2127 

eP 

1.5 

1.4298 

1.4745 

0.0218 

2.1810 

0.5370 

- 0.2753 

0.6087 

0.0702 

aS 

-1 

- 0.9482 

- 0.7216 

- 9.3936 

- 0.2657 

1.1017 

- 5.9892 

42.4434 

0.0518 


0.2 

0.2760 

0.1745 

0.0082 

0.5801 

0.3184 

2.7465 

8.6887 

0.0760 

rQ 

P31 

0.02 

0.0365 

0.0188 

0.0001 

0.0271 

0.0501 

3.5544 

16.0667 

0.0165 

rQ 

P22 

-1 

- 1.4434 

- 1.1180 

- 8.5167 

- 0.6810 

1.1585 

- 2.6154 

10.2604 

0.4434 

& 

0.04 

0.0391 

0.0280 

- 0.0514 

0.0828 

0.0483 

1.8007 

6.8699 

0.0009 

& 

0 

- 0.0013 

- 0.0001 

- 0.0562 

0.0295 

0.0108 

- 0.9656 

8.6647 

0.0013 

& 

- 0.8 

- 1.3134 

- 1.0069 

- 6.6218 

- 0.5230 

1.0095 

- 2.1866 

6.7837 

0.5134 


-1 

- 1.8616 

- 1.4688 

- 6.8225 

- 0.7239 

1.5857 

- 0.8374 

0.4486 

0.8616 

^ 2 ^ 

0.02 

0.2610 

0.1233 

0.0017 

0.8445 

0.4000 

4.3370 

26.3265 

0.2410 


0.01 

0.0314 

0.0127 

0.0001 

0.0602 

0.0489 

3.4009 

15.1149 

0.0214 

P22- 

- 0.7 

- 1.1592 

- 0.8226 

- 6.6295 

- 0.1769 

1.0613 

- 1.5312 

4.0173 

0.4592 

/3.C2 

0.01 

0.0383 

0.0207 

- 0.1791 

0.0872 

0.0606 

1.8303 

6.8339 

0.0283 

^23 

0 

- 0.0010 

0.0002 

- 0.2496 

0.0425 

0.0231 

- 4.7594 

70.9923 

0.0010 


- 0.7 

- 1.3493 

- 1.0871 

- 6.3858 

- 0.1818 

1.2570 

- 1.5028 

3.0873 

0.6493 

m2 

0.1 

0.0769 

0.0326 

0.0007 

0.0456 

0.1328 

3.8884 

18.7903 

0.0231 

fOX 

^13 

0.01 

0.1262 

0.0677 

0.0019 

0.2241 

0.1816 

4.0900 

22.9321 

0.1162 

70 

2 

1.9495 

1.9564 

0.0111 

2.1332 

0.5018 

- 2.0313 

7.2117 

0.0505 

Si 

0.7 

0.8427 

0.9478 

0.0186 

1.1315 

0.3418 

- 0.7662 

- 0.4319 

0.1427 

S2 

1 

0.6263 

0.5797 

0.0225 

1.0547 

0.3573 

0.4775 

- 0.4588 

0.3737 

S3 

0.8 

0.5591 

0.4891 

0.0182 

1.2413 

0.3631 

0.6926 

- 0.4573 

0.2409 


0.0067 

0.0106 

0.0093 

0.0009 

0.0215 

0.0049 

0.6131 

- 0.2135 

0.0039 


Table 2: Parameter estimates for the Ai(3) based on Quasi-Bayesian methods. Data simulated with M = 10, T = 500 and 
qQ — qP = 1 is controlling for the noise in the generation of the starting value of the optimization routine. Statistics are 
obtained from L = 200 simulation runs, mean, median, min, max, std, skew and kurt stand for the sample mean, median, 
minimum, maximum, standard deviation, skewness and kurtosis of the point estimates ’dt, ^ = 1,.. . ,L. I'd — i?! stands for 
absolute value of the mean deviation from the true parameter. The true parameter values d are reported in the second 
column. 


as 

6 i « = 10 / 1.5 = 

= 1.5 

0.01 

1.0000 

0.0286 

0.05 

1.0000 

0.0476 

0.10 

1.0000 

0.0857 


Table 3: Parameter tests based on the Wald test (1^ : Data simulated with M = 10 and T = 500; as stands for the 
significance level; ce = 1 controls for the noise in the generation of the starting value of the optimization routine. The null 
hypothesis is 0^ = , which is tested against the two sided alternative 9‘^ ^ 9^ ■ The draws of the Quasi-Bayesian sampler 

are used to estimate 9^^, 9^ as well as the asymptotic variance of 9^ — r . The quantities presented are rejection rates of 
the null hypothesis given the significance level as- Statistics are obtained from L = 200 simulation runs. 
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'd 

d 

median^ 

min^ 

max^ 

std^ 

skew^ 

kurt^ff 


12.0667 

11.6886 

6.1634 

15.2185 

2.0573 

- 0.5395 

3.0115 

QP 

0.0682 

0.0460 

0.0174 

0.3855 

0.0728 

2.7690 

10.3944 


- 0.1036 

- 0.1005 

- 0.1754 

- 0.1000 

0.0108 

- 3.8346 

17.1485 


0.0331 

0.0245 

0.0128 

0.1169 

0.0245 

1.8071 

5.5307 

rQ 

P 31 

0.0108 

0.0088 

0.0060 

0.0283 

0.0052 

1.6536 

4.6572 

P22 

- 1.4100 

- 1.3316 

- 2.3193 

- 0.8051 

0.5064 

- 0.3689 

1.6129 

& 

0.0925 

0.0915 

0.0782 

0.1203 

0.0073 

0.5622 

3.3120 

& 

- 0.0096 

- 0.0092 

- 0.0143 

- 0.0060 

0.0020 

- 0.4681 

2.4442 


- 0.8124 

- 0.7957 

- 1.1401 

- 0.7108 

0.0720 

- 2.4422 

9.4258 


- 0.7390 

- 0.5119 

- 2.1933 

- 0.1430 

0.5474 

- 0.8072 

2.3157 


0.0542 

0.0475 

0.0191 

0.1164 

0.0231 

0.6122 

2.4151 


0.0196 

0.0196 

0.0083 

0.0379 

0.0053 

0.2360 

2.9440 

^22^ 

- 2.9191 

- 2.9900 

- 5.5775 

- 1.1761 

1.0701 

- 0.2561 

2.1254 


0.0047 

0.0049 

0.0017 

0.0088 

0.0019 

0.0118 

1.4428 

^^2 

- 0.0019 

- 0.0020 

- 0.0030 

- 0.0010 

0.0005 

0.2345 

2.0680 


- 0.4352 

- 0.4247 

- 0.8304 

- 0.3137 

0.0704 

- 2.0444 

9.6661 

K>X 

^12 

0.0324 

0.0295 

0.0155 

0.0570 

0.0101 

0.4526 

2.0998 

t7X 

^13 

0.0871 

0.0842 

0.0508 

0.1412 

0.0215 

0.4104 

2.1585 

70 

1.8155 

1.8188 

1.5172 

1.9797 

0.0662 

- 0.7029 

4.5135 

Si 

0.2004 

0.2002 

0.2000 

0.2024 

0.0004 

2.2614 

8.7845 

S 2 

1.1715 

1.1476 

0.7876 

1.4995 

0.2554 

0.0554 

1.4457 

S 3 

1.4936 

1.4957 

1.4258 

1.4999 

0.0074 

- 3.3418 

21.3889 


0.0119 

0.0119 

0.0106 

0.0137 

0.0006 

- 0.0545 

3.4937 


Table 4: Parameter estimates for empirical H-15 interest rate data for the Ai(3) model. Statistics are obtained from 
M = 20, 000 draws with Mj, = 5, 000 burn-in steps, •d stands for sample mean, median.^ for sample median, min^ for sample 
minimum, max^ for sample maximum, std^ for sample standard deviation, skew-o for sample skewness and kurt-s for sample 
kurtosis obtained from the draws of the chain (: m = M;, -|- 1,..., M). 
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A Affine Models 


The following paragraphs - based on iFillDovid ([20091) - describe affine processes. Let us assume the 
following: The state space is given by ^ C W(t) stands for d—dimensional standard Brownian 
motion on a filtered probability space (12,-T, {iFt)t>o,Q) and for any initial value X(0) = x, x £ there 
exists a unique solution (X(t)) for the stochastic differential equation 


dX(t) = /3^(X(t))dt + p(X(t))dW(t), where /3^(x) £ and p(x) £ 


ndxd 


(27) 


An affine stochastic process is defined as follows: 

Definition 2 (Affine Process). Consider X(t) £ (X(t))^>Q described by the stochastic differential 

equation is called affine stochastic process if the Fg conditional characteristic function of X(t) is 
exponentially affine in X(s), 0 < s < f. Thus, there exist functions <h(t, u) £ C and T'(t, u) £ C'^, with 
jointly continuous t-derivatives, such that 


E (exp(u'X(t))| Js) = exp ($(t — s, u) + Tf(t — s, u)'X(s)) 


(28) 


for all u £ and s < t. 

As the conditional characteristic function is bounded by one, the real part of the exponent ^{t — s, u) + 
^{t — s, u)'X(s) is negative. The functions $(t, u) and T'(t, u) are uniquely determined by (1281) for t > 0 
and u £ and satisfy the initial conditions <h(0, u) = 0 and T'(0, n) = n. 

If (X(t))jg]jj^ is affine, then the drift term 0^{'X.{t)) a nd the (positive definite) diffusion matrix 


Filipovic 


( 2009 ) [Dehnition 10.1 and Theo- 


a(X(t)) = p(X(t))p(X(t))' are affine functions in X(t) (see 
rem 10.1]); i.e., /3'5(x) = + Y!,i=i XiPf and a(x) = a + XiOLi where b'^, (3^ and x are vectors of 

dimension d and a(x), a and cxi are d x d matrices. = (/3^,... ,/3^) is a d x d matrix. In addition. 
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Ht,u) and ^(i,u) solve the following system of Riccati equations; see iFilioovia ([20091) [Eq. 10. 



dt^{t,u) = + (b'5)''Jr(t,u), ^>(0,u) = 0, 

= ^'^{t,uycxi^{t,u) + {f3fy^{t,u), ’J'(0,u)=u, 


( 29 ) 


i = 1,... ,d and u G 


B Matrix A for Am{^) Models 

This section derives the matrix A for an arbitrary Am(3) setting; where 0 < m < 3. In the first step we 
ignore all the restrictions arising from admissibility, the boundary conditions, stationarity and identifica¬ 
tion, and calculate A for a model with diagonal diffusion, where all elements in b^, /3^, S), and B^ 
are free parameters. To obtain A for a particular Am(3) model, the corresponding parameter restrictions 
have to be taken into consideration. Moreover, restrictions like for some ij can also be included. 

This allows a joint treatment of all models. 

For the first four moments x^, k = = 4, we choose the basis 

(l 1 xi,X 2 ,X 3 I ...,1 xf,...,X 3 I ...,a:|). In this expression we have separated the terms of 

different power by means of |. I.e., with d = 3, we get one term for k = 0, three for k = 1, six for k = 2, 

ten for k = 3 and fifteen for fc = 4. Therefore N = 35. The elements of matrix A not presented are zero 
by the model assumptions. 

In the following we use (1191) and start with A; = 0: Here we immediately observe that the first row of A is 
Ai,: = OixAT. With k = 1 we obtain the rows 2 to d -|- 1 of the matrix A as follows: With /(x) = Xj we 

set = 1, §!(■ = 0 and ^ = 0. Hence, g{xi) = b[ -h /3f x, z = 1,..., d. This yields 




/3fi /3f2 /3f3 

0... ^ 

II 


1^21 P22 1^23 

0... 



(^31 (^32 P33 

0... j 


^^Extensions with jumps are possible - for some theory see iKeller-Ressel and Maverhofeil (l2012l ) , iMaverhofer et aP (1201(1) , 
iDuflie et all ll2000l) , iDuffie et all (l2003l ). 


29 


























Next, for k = 2 we have to consider d{d + l)/2 = d 2 basis elements, corresponding to rows d + 2 to 
d + 1 + of A. We arrange the basis elements as follows = (xf, xiX2, X1X3, X2X3, X3). Since 

the diffusion matrix is diagonal we have only non-zero elements in the generator for i = j. 

The hrst partial derivatives with respect to xi are 2xi, X 2 , X 3 , 0, 0, 0 for these basis elements. The 
second partial derivatives with respect to xi are 2, 0 ,0, 0, 0, 0, 0, etc. For X 2 and X 3 we proceed in the 


same way. 

For example. 


consider /(x) = xf and thus equation (fT9|) yields Q{x\) = {hf + x) 2 xi -|- 


1 

2 Z^7 = 






X ry. . 

'jl^J 


For /(x) = xiX 2 , where X 2 , = xi and = 


8x2 


dx\X2 


(fT^ and the fact that S(X(t)) and X) are diagonal matricesFu result in ^(xiX 2 ) = ( 6 f + / 3 f’x)x 2 + 
( 62 ’ + xi ^ '1 + 5 [^S ]23 • 1. With X 1 X 3 ,..., X 3 we proceed in the same way. This results in 


A 5 ; 10 , 1;10 


/ 


2bf* + 



2^fi 

2/3 

2/3^3 

0 

0 

0 

0 

... \ 


0 



0 

P21 

^11 + ^22 

P23 

I3i2 

Pl 3 

0 

0 



0 

bi 

0 

t'f 

^31 

^^2 

Pll + P33 

0 

P12 

Pl 3 

0 




siHis 

262^ + ^2^322 

^2^32 

0 

2/3fi 

0 

2/3^2 


0 

0 



0 

0 



0 



1^32 

^22 + ^33 

^2"'3 

0 


\ 

si 50 

siB?3 


2b^ + ^3633 

0 

0 

2^3i 

0 

2/332 

2^3^3 

0 

... ) 


For k = 3 we obtain (g) = 10 = ^3 elements. Therefore we consider the rows 11 to 20. The basis elements 
are x^ = (xf, xfx 2 , xfxs, xix|, X 1 X 2 X 3 , xix|, xf, 2 : 2 * 3 , 2 : 22 : 3 , 2 : 3 ). Then, 


■A.i1;20,1;10 = 


/ 

0 


0 

0 

36f + SSfBfi 

3S?Bfi 

3S?Bfi 

0 

0 

0 

\ 


0 

0 


0 

bi 

2 bf + 

0 

sfBfi 

EfBfi 

0 



0 

0 

0 

sfB? 

b^ 

0 

2 bf 

0 

S?Bfi 

S?Bfi 



0 

si 5° 

0 

0 


2b2 + £ 1^22 

^2^32 


0 

0 



0 

0 

0 

0 

0 


^2 

0 


0 



0 

si eg 

0 

0 

S 3 B 33 

siB|3 

2b^ + siB |3 

0 

0 




0 

0 

3Si5i 

0 

0 

3SiBf2 

0 

3b^ +3SiBf2 

3SiBf2 

0 



0 

0 

0 

siBi 

0 

0 

ip2 y2X 
^2^12 


2 ^ 2 ^ + SiB 22 

S 2 B 32 



0 

0 

siBi 

0 

0 

siB^3 

0 

siBfa 

263 ^ + SiB33 



V 

0 

0 

0 

3SiB§ 

0 

0 

3E=Bf3 

0 

3SiB23 

3b^ + 3£iB|3 

/ 


Where Sii{X{t)) = B° + (Z?f)'X(t) and Sij{X{t)) = 0, = 
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and 


■A.i1:20,11:20 = 


( 3/3h 

3^12 

3/3^ 

0 

0 

0 

0 

0 

0 

0 

0 . . . ^ 

P21 

2^11 + ^22 

^2Z 

2/3 

2^13 

0 

0 

0 

0 

0 

0 

P31 

^32 

2 ^fi + 

0 

2/3i2 

2^13 

0 

0 

0 

0 

0 

0 

2/3^1 

0 

^11 + 2/3^^ 

2/3^ 

0 

^12 

/3f3 

0 

0 

0 

0 


^21 

/33^2 

^u. + ^22 + ^33 

1^23 

0 

/3f2 

1^13 

0 

0 

0 

0 

2/3^1 

0 

2/3^ 

Pll + ‘^^33 

0 

0 

/5f2 

^13 

0 

0 

0 

0 

3^21 

0 

0 

3/322 

CO 

0 

0 

0 

0 

0 

0 


2/3^1 

0 

/53^2 

2/3^ + ^33 

2/3^ 

0 

0 

0 

0 

0 

0 

2/3fi 

/3fi 

0 

2^32 

2/333 + -^22 

/3f3 

0 

V 0 

0 

0 

0 

0 

3/31*1 

0 

0 

3/3^ 

3/333 

0 ... / 


Last but not least, with A: = 4 we have = 15 basis elements 

= (^xf, xfx2, xfx3, xfx2, xfx2X3, xfx^, X1X2, X1X2X3, X1X3, X2X3, X2X3, x|). Then 

we obtain: 
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21 : 35 , 1:10 - 









0 

0 

0 

0 


0 

0 

0 

0 

0 

\ 

0 

0 

0 

0 

0 


0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

3SfB? 

0 

0 

0 


0 

0 

0 

0 


0 

0 


0 

0 


0 

0 

0 

0 

0 

0 

0 

0 


0 


0 

0 

0 

0 


0 

0 

0 

0 



0 

0 

0 

0 

0 


0 

0 

0 

0 


0 

0 

0 

0 

0 

0 


0 

0 

0 


0 

0 

0 

0 

0 


0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

3 Si^ 3 ° 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 


0 

0 


0 

0 

0 

0 

0 

0 

0 

0 


0 


0 

0 

0 

0 

0 

0 

0 

^Bl 

0 



0 

0 

0 

0 

0 

0 

0 

0 

3E26§ 

0 


0 

0 

0 

0 

0 

0 

0 

0 

0 

esi^o 

/ 
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A21;35,11:20 = 

' 46f + 

6E?Bfi 

6EfBfi 

0 

0 


36f + 

0 

3S?5fi 

3EfBfi 


0 

3bf + 

0 

SEfBjl 

V'S r3X 
■^2^12 

2b2 + ^^2^22 

^2^32 

2bf + EfBJi 

0 

0 

bi 


0 

26f + EfB 

siB?3 

siBfa 

2b^ + ^3^33 

0 

0 

0 

3E|B*2 

0 

36f +3E|Bf2 

3E|Bf2 

0 

0 

^2^12 

bi 

26f + e|B 

0 

Y>2 K>CC 

^3^13 

0 


2b^ + e|B 

0 

0 

3EiBf3 

0 

SElBfa 

0 

0 

0 

6SiH?2 

0 

0 

0 

0 

0 

3E|BJ2 

0 

0 

0 

yi2 r3X 

•^3^13 

0 

0 

0 

0 

0 

3E|bJ3 

V 0 

0 

0 

0 
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C Moments of the Observed Yields 


The following paragraphs obtain the first four moments of the yields observed, i.e. IE A: = 1,... ,4, 

the auto-covariance of the yields, E , and the auto-covariance of the squared yields, E (yliyl_i • 

Assumption [2] specifies the moments E If p moments of yt should be considered, we get the 


number of moments by summing over the multinomial coefficients, i.e. 


^ / j+M-1 

i=i \ j 


(31) 


Powers of sums can be obtained by means of the multinomial formula. With k = Yf=i — 0) 


{XI+X2 + ---+ Xdf = (i I ^ 7 ) n 


^liere Let 

d{i,K) 


for iC e N and i < p. In accordance with equation ([8]), we write di = when K = d. That is, 

the notation is simplified when K = d. Note that di calculates the dimension of conditional moments 
E(X(t)*|X(s) = x). In addition, W = X)}=o dj corresponds to the sum of the conditional moments smaller 
or equal to i. 

We shall derive the first four moments, which implies that p = 4 in the following. From (1221) we get 

^'’Here we derive the 1st moments with fc = 1, ( = 0, where Efsti) = 0 for i = 1,..., M. For the 2nd moments: k = 2, 
I = 0, such that E {su) = for all i; with k = I = 1, i j we get E(£ti£tj) = 0, i j. For the 3rd moments: k — 3, I = 0, all 
i, k = 2, I = 1, i j, and k = 1, I = 2, i j. All these terms are zero by assumption, i.e. E {stiStj) = 0, and E (etiEti) — 0, 
i j. For the 4th moments: fc = 4, / = 0, all i, fc = 3, Z = 1, i 7 ^ j, ^ ^ = 2, i 7 ^ j, ^ = IN — 3, i j, E = 0, 

E (e?i£tj) = 0, and E (eti£tj) = 0, i 7 ^ j, E (sti) = n-f. Note that af stands for the fourth moment of eu, where in general 
mi) i^Oi- 


i + K — 1 
i 


(33) 
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the first moments by means of 


E(yO = $ + ’I'E(XO , 

E(yu) = ^^ + ’I''E(X0 = ^^ + ’I^'E(Xt,i:3) , (34) 

where GR,'^i = -:^^(ri,0) G $ G Xt G yt G and ^ G 

The second moments of the yields are given by: 


RiytiVtj ) 


E(XO + ^^'E(XtX;)^,- + R{euetj) 

E(Xt,i:d) + ^''E (vech-^iXt^d+i-.d+d^)) + H^tiStj) , (35) 


for i,j = 1,..., M. In (1351) we need the function vech~^. The purpose of this function is to transform the 
d{d + l)/2 X 1 vector '^t,d+i:d+d 2 the symmetric d x d matrix XtX(.. In more details, '^t,d+i:d+d 2 = 
vech (XtX(.), where vec (XtX(.) vectorizes the dx d matrix X^Xl and vec h (XtX(.) eliminates the supra- 
diagonal elements from the x 1 vector vec (Xtx;) (see, e.g.. 


Poirier! . 


199a, page 646). Hence vech (XtX(.) 


is a d(d+1)/2 x 1 vector. The function vech ^ takes us back to XtX(., i.e. vech ^ maps the d(d+1)/2 x 1 
vector E(Xt,rf+i:rf+(i 2 ) to the symmetric d x d matrix E(XtX(.). For d = 3 this works as follows: 


vech 


-1 


/ 

\ 





ai 



Ol 

02 

03 



= 

02 

ffl4 

O 5 

^ 06 

/ 


03 

O 5 

06 


(36) 


and thus vech ^(Xt,d+i:d+d 2 ) = XtX(.. By Assumption [2] we obtain R{Xtieti) = 0 for / = 1 ,... , d, 
f = 1,..., M and 




af, forf = j, 

0 , for i^j, 


(37) 
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for i, j = 1,... , M. Based on this, (I35p can be written as 


^{ytiUtj) — + (m2^yE(Xt^rf+i.rf+c(2) (38) 

where for d = 3 we define nij = 'I'ji ^j2 + + 

^i3'^jl,^i2^j2,^i2^j3+^i3'i>j2,'^i3^j3y = G and thus (m2^)']E(Xt,4:9) = 

Regarding the third moments we observe: 

Eivly,,) = E(^i<^, + ^'^,+euf{^,+^rX,+e,,)) 

= E (($, + + ^'XO + ($, + + 2($, + XOet*) 

+E (2($i + \I/'Xt)eti£:ti + (^j + + Sti^tj) 

= E ($2$^ + $?(’S''Xt) + + 2$,(4''X0(’S''Xt) + + (4''X0"(^'X0) 

+2($, + ^'E(X,))a."l(i=,-) + + ^'E(X,))a." 

= $2$^ + ($2^/ + 2$,$,’S'')E(Xt) + 2$,’S''E(XtX;)’S'j + $j^'E(XtX;)^, + E((’S''Xt)2'S''Xt) 

+2($, + <I''E(X,))a2l(,=,.) + ($, + ^'EiX,))af 
= + <J^%) E(X,) + (2$,(m*^)' + a>,(m“)') E(X,,,+i,,+,2) 

+ (mf^)'E (X t, ^d-\-d2-\-l'd-\-d2-\-d^ )+2(<i>, + '®''E(Xt))a,'l(,=,) , (39) 

where mp = ^? 2 ^ji + 2 'I'ii^i 2 ^i 2 , + 

2(^ii^i2^j3 + ^*l^i3^j2 + ^*2^i3^jl), ^h'^j2, + 2^i2^i3^i2, + 2^'*2^i3^j3, 

'h^g'hjs)^ G and I(.) stands for an indicator function. By Assumption [2] we get K{X^iSti) = 0, 
E(X^;eti) = 0, E{Xtietietj) = 0, for i 7 ^ j, and E{e'^-etj) = 0 for Z = 1,..., d and i,j = 1,... , M. In a 
similar way and under the same assumptions, it can be shown that 

E + ^]) <5^* + + a|'a>') E(X,) + (2$,(m*^)' + $,( 01 "/) E(X t ^d-\-l'.d-\-d2 ) 

+ (mf )'E (x,,,+,,+i,d+d.+d 3 ) + 2(‘i>. + '5>'E(X,))tT2l(,=,.) , (40) 

where rrig^" = ^'i 2 ^ji + 2 ^'ii^ii^i 2 , + 2^'ii^ii^i3, + 2 ^'i 2 ^ii^'j 2 , + 

2^'i3^'ji^'j3, 2(^'ii^'j2^'j3 + ^i2^'il^j3 + ^i3^il^i2), ^'i2^^2> ^'i3^j2 + 2^i2^j2^'j3, ^'i2^'^3 + 2^'i3^'j2^j3, 
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'I'is'I'js) ^ d = 3. For the fourth moment we obtain 


^{yWtj) 


E + euf 

E (($2 + (’Jr'xo" + 4 + + 2 <^>ieu + 2^'XtetO 

X ($2 + (^'Xt)^ + elj + 2$j’I''Xt + 2 <^jetj + 2 ^'jX^e^j)) 

+2^>i’I^'E(Xt)(^>2 + a]) + 2$j^;.E(Xt)($? + a^) 


+(mf)'E(X,,,+i.d+dJ(‘h2 + 4) 

+(m^4'E(X,,,+i:,+rfJ(<f>2 + 4) + 4ch,ch,(m*^)'E(X,,,+i:rf+rfJ 

+2<I>j(m3'^ ) E(X-t_|^_|_c(2+l:(i+d2+d3) ^) ®"(^t,(i+d2+l:d+rf2+<i3) 

+m^ ®^(^t,d+d2+rf3+l:ci+c(2+ii3+c(4) 


+44 


$2 + 24>i’I^'E(Xt) + (mf )'X 


, d -|-1 '. d-\-d2 


I(i=j) 


(41) 


where (Xt^d+d 2 +d 3 +i-.d+d 2 +d 3 +d 4 ) = E (^(^''Xt)2(^' Xt)^^. By sticking to Assumption [2] the 

expectation E(44) = E(e+) and E(e+e+) = 44 3 / Moreover, 

24'ii4'ji(4'i24'ji+4'*i42)>24'ii4'ji(4'ii4'j3+4'i34i)>4i42+^i24i+^^*i^*24'ji4'j2,44'*i4'ji(4'*24-3+ 
4'i34'j2) + ‘^i'^a^j2^j3 + 4i43 + ^?34i + 44'*l4'i34'ji4'j3, 24'i24'j2(^*24i + 

44'i24'j2(4'i34i + 4'ii4'j3) + 2(4'ii4'i342 + 44-^34'j3(4'ii4'j2 + 4'i24i) + 2(4'ii4'i24'|3 + 

434i 4-2), 24'i34'j3(4'i34'ji + 4'ii4'j3), 4242’ 24'i2 4-2 (^*34-2 + ^i2'^j3), + 43^2 + 

44-^24'i34'j24-3> 24'i34'j3(^*24'j3 + ^i34'j2)> 4343 ) 

For the auto-covariance of the yields and the auto-covariance of the squared yields we have to calculate 
E(X{1(X^)') (which will become clear later). Before we proceed with these moments we obtain the result 
that only with exponents l < v enter into the calculation of the conditional moment E (X((|Xs = Xg). 
In addition we derive a result on the structure of exp((t —s)A), which is presented in the following lemma: 


Lemma 1. LetD andB benxn lower-block triangular matrices such that: T)mi-.ni,ni+i-.n = ^mi-.ni,ni+i-.n = 
0 where m* < n* for i = l,...,k, k < n, Uk = n and Ui < Uj+i for i = 1,...,A: — 1. Then the 
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matrix C = DB is of the same structure, namely Cmi-.ni,ni+i-.n = 0 where mi < Ui and Ui < n^+i for 
i = 1,... ,k — 1. 

Proof: Let j and I be such that there exists i € {1,..., A;} such that < j < n,, n* + 1 < / < n. 
Then Cji = (Dji ,..., 0,..., 0)(0,... , 0, , Bnif = 0. □ 

Note that for a square matrix B, exp(B) = Thus, if B is a matrix of the structure described 

in the lemma then exp(B) has the same structure as well. As the matrix (t — s)A is of the structure 
described in Lemma [H this and the definition of exp((t — s)A) imply that also the matrix exp((t — s)A) 
is of that same structure. Thus, 


exp((t - s)A)n„-1:N„,N.u+1:N = 0, 


which gives 


exp((t - s)A)n„_i+i-.n„,: [l, (x^)', (x^)', • • •, (xP)']' 

= [exp((t - s)A)jv„_^+i,jv„,i:Ar„,exp((t - s)A)n„_i+i,n„,N„+ 1 -.n] [1, (x^)', • • • , (x’')', (x^'+i)',..., (xP) 
= [exp((t - s)A)N„_i+l:N„,l:N„,Od„xN-N„] [l, (x^)', • • • , (x’')', (x’'+^)', . . . , (x?*)']' 

= exp((t - s)A)Ar„_i+i,Ar„,i:Ar„ [l, (x^)', . . . , (x’')']' + 0d„ x7V-iV„ X [(x’'+^)', . . . , (x^)']' 

= exp((t - s)A)N„_r+l:N„,l:N„ [l, (x^)', • • • , (x’')']' • 


()13p and the above calculations show that: Only x^ with l < v enter into the calculation of the conditional 
moment E(X^|Xs = x). The conditional expectation of the u-th moment of Xt with respect to Xg = x is 


E(X^;|X3 = x) = E (Xt,iv„_i+i;7vjX3 = x) = exp((t - [l, (x^)',..., (x")']' 

= exp((t - s)AA)Ar,_i+l:Ar„,l + exp((t - s)AA)Ar^_^+i:Ar„,2:l+diX^ H- 


• • • + exp((t - s)AA)Ar„_j+i, 7 v„,iV„_i+l; 7 V,x’' , 


(42) 
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which is of dimension x 1; A £ M_|_+ is the step-width already dehned in Section f3.,31 This implies!^ 


E(X"(X-)') 


e(e(x?;|X3)(x™)') 

exp((t - s)AA)^„_,+i:^„,iE((X“)') +exp((t - s)AA)jv„_,+i:^„.2:i+diIE(X3(X^)') 

+ • • • + exp((t - s)AA)^„_,+i:^„,^„_,+i:^„E(X"(X™)'). (43) 


Then for t > s we obtain 


E(ytiysi) 


'^{ylivli) 


E (($i -h ^-'Xt -t eti) (4>i + ^-'X^ + Esi)) 

4>? + 2^i^'iE{Xt) + «''E(XtX()^. 

+ 2 -I>,«''E(At) + exp((t - s)A ■ A)2:i+d,iE(X;)^i + exp((t - s)A ■ A)2:i+d,2:i+dE(XtX;)^i 
2 $.«''E(Xt,i:d) -t exp((t - s)A • A)2:i+d,iE(A;i,rf)^i 

+^''exp((t - s)A • A) 2 :i+d, 2 :i+dE(vec/i“^(Xt,i+ti:ti+d 2 ))^i I ^nd (44) 

E (($i + ^'Xt -h eti)" (4-* -t ^'Xs -h £si)") 

E (($f (^-'Xt)" -h 4 + 2$i^'Xt + 2$i£ti + 2«''Xteti) 

(4-? + (^-'X^)" + eli + 24>i4''Xs -t 2$i£si + 24''X3£si)) 
c&t + 2$?E((4''X.)") -h 24-?4 + 44-?4''E(Xt) E((4''Xt)"(4''X3)") 

-H24E((4''Xt)") -t 2$iE ((4''Xt)"'4''X3 + 4''Xt(4’'Xs)") -t 4$i4'4''E(Xt) -t 44-?E ((4''Xt)(4''X3)) 

$4 + 2($? + 4)E ((^''Xt)") + 2$Ui + E ((4''Xt)"(4'X)") 

-H2$.E ((4''Xt)"4''X3 -t 4''Xt(4’'Xs)") 4($f 4)$i4''E(At) -h 4$fE ((4''Xt)(4''X3)) . (45) 


To complete the calculation of these moments, the quantities E((T''Xt)^), E ((T''Xt)(T'^Xs)), 
E ((T''Xt)2T''Xs), E (T''Xt(T''Xs)2) and E ((T''Xt)4T^'Xs)2) have to be derived. To simplify the nota¬ 
tion, we omit the index i in T'j in the following expressions; £ M is the element I of the T' £ (when 

step width A = 1 was already assumed in the main text. To derive the following moments with a different step-width 
if necessary, A will be included in the following expressions. 
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the index i is still included this would be 'I'iz). Note that 


E(('J''XO(^'Xs)) 


^'E(XtX;)^ = ^'E{vech-\XI))^ and 

VE(XtX'3)'J' = (E (E(Xt|Xs)X'3)) ^ (46) 

[exp((t - s)AA)2:i+d,iE(X;) + exp((t - s)AA)2;i+d,2:i+dE(XtX;)] ^ 
[exp((t - s)AA)2:i+d,iE(X(.) + exp((t - s)AA)2:i+d,2:i+dlE(uec/i"^(X2))] 


In addition, we obtain 


E ((’J''Xt)^’J''Xs) = ’J''E(Xt^''XtX'3)^' = ’I^'E ((XtX'3)(’J''Xt)) ^ , (47) 


for t > s. Here we observe the following equality: 


XtiXsi 

XtiXsa • 

• XtiAsd 



XtaXsi 

XtaXsa • 

• X^2X,d 

d 

Y^-^lXu- 

1=1 

(48) 

XtdXsi 

XtdXsa • 

XtdXsd 




Equation (1471) requires the derivation of K{X-tiXsjX^i) where i, j, I G {1, ..., d}. To simplify the notation, 
the following functions gi{-), i = 2,3,4, are introduced to facilitate tracking specific elements of the 
moments vectors. We obtain 

92{i,j) = (i-l) (^d-+j , (49) 

for i,j G N and i < j < d. Moreover we derive 

i—1 

gsihj^rn) = dd-k + (2d - z - j + 3) + m - j + 1 (50) 

k^l 

i-l i-i / TO _ X\ 

gi{i,j,m,n) = dd-k+ '^d(^d-i+i-k,d-i) + id + 1 -^-j (m - j) + n - to + 1 , (51) 

fc=i fe=i ^ 

for i,j,m G N and i < j < m < d and for i,j,m G N and i<j<m<n<d, respectively. While d was 
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the dimension of the process (X(t)), is the function already defined in For d = 3 this yields 




gsihj^rn) = < 


1 , 

if 

i = 1 , 

j 

= 1 

2 , 

if 

i = 1 , 

j 

= 2 

3, 

if 

i = 1 , 

j 

= 3 

4, 

if 

i = 2 , 

j 

= 2 

5, 

if 

i = 2 , 

j 

= 3 

6 , 

if 

i = 3, 

j 

= 3 


(52) 


1 , 

if 

* = 1 , j 

= 1 , m 

= 1 

2 , 

if 

* = 1 , j 

= 1 , m 

= 2 

3, 

if 

* = 1 , J 

= 1 , m 

= 3 

4, 

if 

* = 1 , i 

= 2, m 

= 2 

5, 

if 

* = 1 , i 

= 2, m 

= 3 

6 , 

if 

* = 1 , i 

= 3, m 

= 3 

7, 

if 

* = 2 , J 

= 2, m 

= 2 

8 , 

if 

^ = 2 , J 

= 2, m 

= 3 

9, 

if 

* = 2 , J 

= 3, m 

= 3 

10 , 

if 

. 

II 

50 

= 3, m 

= 3 


(53) 
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and 


g 4 {i,j,m,n) = < 


1 , 

if 

i = 1, 

3 = 1, 

m = 1, n 

2 , 

if 

i = 1, 

3 = 1, 

m = 1, n 

3, 

if 

i = 1, 

3 = 1, 

m = 1, n 

4, 

if 

i = 1, 

3 = 1, 

m = 2, n 

5, 

if 

i = 1, 

3 = 1, 

m = 2, n 

6 , 

if 

i = 1, 

3 = 1, 

m = 3, n 

7, 

if 

i = 1, 

j = 2. 

m = 2, n 


if 

i = 1, 

j = 2. 

m = 2, n 

9, 

if 

i = 1, 

j = 2. 

m = 3, n 

10 , 

if 

i = 1, 

j = 3. 

m = 3, n 

11 , 

if 

z = 2, 

j = 2. 

m = 2, n 

12 , 

if 

z = 2. 

j = 2. 

m = 2, n 

13, 

if 

z = 2. 

j = 2. 

m = 3, n 

14, 

if 

z = 2. 

j = 3. 

m = 3, n 

15, 

if 

z = 3, 

j = 3. 

m = 3, n 


(54) 


Let e be a vector of ones, e = (1,..., l)'o and e = (1, 2,... , d)'. Then M, M-^ and M-^’^ are the following 
^2 X 2, (i 2 X 3 and ^2 x 4 matrices: 


/ 


M = 


V 


e e 

2 e e 2 :rf 

^i:d 

d d 


\ 


which for d = 3 is M = 


/ 




1 2 

1 3 

2 2 

2 3 

3 3 


(55) 


^It’s dimension is not specified on pnrpose as it will vary and will be clear from the context. 
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(56) 


M^' = (M,je) and = (M^/e) = {M,je,le) , 

where e is here a vector of ones of the dimension d 2 x 1- Thus, for t > s 

E{XuXsjX^i) = E(]E(XtiXtz|X3)X3,-) 

= exp((t - s)AA)fc_iE(Xsj) + exp((t - s)AA)fc^ 2 :i+dE(XsAsj) 

+ exp((t - s)AA)k, 2 +d-. 2 +d+d 2 H'^l^sj) 

= exp((t - s)AA)fc,iE(Xtj) + exp((t - s)AA)fc, 2 :i+dE 

+ exp((t - s)AA)fc, 2 +d: 2 +d+d 2 E (x^3(Mj),t) > (57) 

where k = 1 + d + g 2 ii,l)- Thus, the (i, j) element, i, j = 1,..., d, of matrix in (H71) is 

d 

[E =J2’^iE{X,iXsjX,i) , 

1=1 

where K{XtiXsjX^i) is given by (I57t) . Let t > s, then 

E (’J''Xt('J''X3)2) = ’J''E(X4 'J''XsX'J'J' = 'Sf’E ((XtX'J('J''Xs)) ^ , (58) 


where 


(XtX'3)(^^'X3) = 


XtiXsi 

XtiX32 •• 

• X^,X,d 

XtaXsi 

Xt2X32 •• 

• Xt2X3rf 

XtdXsi 

XtdX,2 •• 

XtdXsd 


Ei-iv 


sl 




i=i 


AtiAsiA^i ••• X^iX,dX,i 

X^2XsiX,i X^2X,2X,i ••• X^2X,dX,i 

X^dX,iX,i X^dXs2X,i ••• X^dXsdX^i 


(59) 
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For expression (f 5 ^ one needs to know ]K{X-tiXsjXsi) where i, j, / £ { 1 ,... ,d}. Thus, for t > s 


E{XuX,jX,i) = ]E(E(Xti|Xs)X3,-X3z) (60) 

= exp((t - s)AA)i+i^iE{XsjXsi) + exp((t - s)AA)i+i^2:i+dE{XsXsjXsi) 

= exp((t - s)AA)i+i,iE + exp((t - s)AA)i+p2n+dE . 

The {i,j) element, i, j = 1,..., d, of matrix in (l5^ is 

d 

[E((XtX')T^X)],^. = Y,'^iE{XuX,jX,i) , 

i=i 

where E{X-tiXsjXsi) is given by (fOTT) . Finally 

E ((T^'Xt)^(^'Xs)2) = Tr'E (Xt(T''Xt)(VX3)X'3) ^ = T^'E ((XtX'3)(T^'X0(’I'X)) ^ (61) 


and 


(XtX'3)(T^'X0(’I''X3) = 


AtiAsi AtiXss ••• AtiXsd 

XtaXsi AtaXss ••• 


d d 

i=i j=i 


SJ 


( 62 ) 


^td^sl ^td^s2 * * * ^td^sd 

^t2^sl^ti^sj ^t2^s2^ti^sj 


d d 

E E 

i=l j = l 


AtiXsdX-iiXgj 
At2 A-ti Asj 


Atd Agi A-tj Agj A-tci As 2 A-tj Ag j • • • A-ti^AscfA-tjA, 


sj 
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Then for t > s we have 


^{XtiXsjXx,mXsn) 


K{¥.{XuX^^\y.,)x,jX,n) 

exp((t - s)A • A)fc,iE {XsjXsn) + exp((t - s)A • A)fc, 2 :i+dE {X^X^jX^n) 

+ exp((t s)A • A)/j^ 2 +d:l+d+(i 2 ®^ (^X^X^jX^n) 

exp((t - s)A • A)fc,iE + exp((t - s)A • A)fc, 2 :i+dE 

+ exp((t - s)A • A)fc, 2 +d:i+d+d 2 E , (63) 


where k = 1 + d + g 2 {i, m). The {i,j) element, i,j = l,...,d,of the matrix in (l6T]l is 

d d 

[E((XtX')('^X)(^X))],^. = EE* k^iE{XuX,jX^kX,i) , 

k=l1=1 

with expectations being given by (f63]l . 


D Solving for $(t,u) and u 


This section derives the functions u) and u) of the Riccati differen tial equations de scribed by 


for an Am{d) model with diagonal fdn. By equation ([6|), which is based on 


Filipovid (2009|) [Theorem 10.4 


and Corollary 10.2], T'(t,u) and <h(t,u) evaluated at t = r,, i = 1,..., M and u = 0^ a r e nec essary to 
compute the zer o coupon prices 7r°(t, r*) and correspondin g model y i elds. For the IVasiceld (|l977l l and the 


Cox et al 


(jl985l l model the solutions are presented e.g. in 


Now we apply the results obtained in 


Fibnovicl (l2009l hn. 162-163]. 


Grasselli and Tebaldil ( 20081 1 [Section 3.4.1] for Am{d) models 


with diagonal mx m matrix f3ij. In the first step we have to solve the linear ODE for the J components. 
I.e. we considei^^ 


T',/(0,u) = uj , 7xj = e„xi- (64) 

^®Note that the dimension of 4'j is n. 
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A particular solution of (16411 is of the structure 


^j{t, u) = exp 



Cl + C2 , 


(65) 


with 


^'j(0,u) = Cl +C2 = UJ. 


( 66 ) 


Then (1651) implies 




(/3jj) exp (^t 


Cl. 


(67) 


Plugging ([65]) and (l67|) into (IMI) yields 


exp [( 3 %) ^ Cl = (^exp [( 3 %') 


Cl + C2 - 7xJ , 


which gives '^xj = ^2 and thus C 2 = ^ 'j This and ([66]) imply that ci = uj — 

(*)') 


'jxj- Plugging the last expression and C 2 into (15^ giveq^ 


^j{t, u) = exp (^(3%'^ ^ UJ - (^exp (^(3%'^ ^ ) Ixj 

= exp (^t [l3%') ^ UJ - ((/3^j) ) (^exp (^t {j3%') ^ 7 ^j. (68) 


In a second step the solution of the subsystem T'j(t,u) is plugged in into the ODEs for the square root 

Equation dMI also follows from [Perkd (1 199 il l [Theorem 1, p. 60]. The matrix product in the last expression of (I68II 
can be exchanged by the properties of the matrix exponential. I.e. (/3jj)“^ exp(t/3jj) = exp(t/3jj)(/9jj)“^ follows from 
exp(YXY-i) = Yexp(X)Y-h 
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terms Thus, the Riccati equations for the I components are 


9iTi(t,u) = + 132^1 -%i 


%i{t, u) = ^ .(T^j(t, u))j - ^ ^ [^j{t, u))] 


i=i 


t=i 


Ti(0,u) = Ui , 73,, = 1 ,z = 1 ,... ,m. 


(69) 


As (j69p is a time inhomogeneous Riccati equation, it can be solved in the following way: The ODE of 
interest is SfTj = + (32^i — Jxi for z = 1,..., m. After the substitution Ui = SfTj, z = 1,..., m, we 


2 i i ' 

-i,,2 I flQ.,. v2 


ge t dfUj = ki'f + — A solution for an inhomogenous Riccati ODE of this structure is provided 


m 


Grasselli and Tebaldil (2003) [Section 3.4.1], The solution for z/j is 


r'j(t,u) = 


Aff ^ (t, u) Ui + {t, u) 

Af^*^ (t, u) Ui + Af^*^ (t, u) 


MW(t,u) = 




where 


( 0 / 




( 0 / 


= exp 


-t/2 0 


. (70) 


At the end T, = for z = 1 ,..., m. With u = 0,^x1 we get 




1 A/f(t,0) 


Aff (t, 0 ) 


, for z = 1 ,..., m. 


(71) 


To derive MW(t, u) the integral %i(s, u)ds has to be solved, where 


^ pt 1 ^ pi, 

/ 73,i(s,u)o!s = -fxit-'^P2+j,i ['^j{s,u)]jds--'^J:l,+jl3l^+j [T'^(t,u)] ds. (72) 

Jo ^ do ^ Jo 


48 







The second term in ()72p can be derived by means of 



using (1681) . The third term in (1721) can be derived numerically as well as the whole expression (1721) . It 
remains to calculate where by (1291) 


dMt,u) 


u)'aT'(t, u) + (b*5)' qf{t, u) , $(0, u) = 0 , 


We can express 0) by means of 0) = 0) + ^j{t, 0). The J components of the d x d matrix 

a are equal to a n x n diagonal matrix having ..., along the main diagonal such that 


/ v2 






^tn+1 0 0 

0 0 


For $/(t,0) we obtain 


\ 


0 0 y 


^j{s,0) ds+ [ (6 ^+i,...,6 ^)T'j(s,0) ds. 
Jo 


^i{t,0) = J ^i{s,0)ds. 


(74) 


0), <I>j(t,0) and ^^(^,0) can be easily obtained by means of numerical integration. To do this we 
generate a grid T = ... ,to} with G + 1 grid points. We set to = 0 and to = max(r;) = tm- 

By including the maturities in T we know that for each maturity we have tfc; = ti 


for some ki G {1,... , G + 1} 


25 


The step-widths are given by = tk — tfc-ij k = 1,...,G -|- 1 


(if some elements of T coincide with r/ this does not cause any problems since = 0 for 


^®Tliis has been implemented as follows: (i) generate an equally spaced grid, (ii) include the M maturities, (iii) sort all 
these points in ascending order. 
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such grid-points). Then we evaluate d>j(t,0) at each t = tk, k = -|- 1. By calculating 


the sums \YXLi ^ 


^l+i 0 
0 


Riemann sums), \Yj^k= 2 ^ ji^k,^)' 


^m+i 0 


0 


^j(4,0)Afc + Efc=”i'(&m+i,---,^d)^j(4,0)Afc (left 
T'j(4,0)Afc X;f;T2(6®+i,...,6^)’J'j(4,0)Afc 


(right Riemann sums), or 2 '^k= 2 ' 


1 \-^ki ^j{tk-i,Oy+'^j{tk,Oy I ^m+l 0 I ’l'j(tk-i, 0 )'+'i'j(tk, 0 )' 


0 


J 2 k‘= 2 (^m+i’ ■ ■ ■ ’ ^d) Afc (trapeze-rule) we get a numerical approximation of ‘hj(ri,0), 

ki = G(-|-l) for Ti = tm- In our code right sums were implemented. Since integrals of T'j and 'J'j 
are necessary to obtain f^‘ %i(t,0)dt, we use numerical integration also to obtain %i(t,0)dt. These 
proxies are then used in (f7T]l to calculate Tj(Ti, 0), i = 1 ,..., m. Equipped with T'/(tfc) 0), /c = 1 ,..., G-l-1 
we are also able to obtain a numerical approximation of ‘h/(r/, 0). 


E Restrictions on the Parameters 

First we present the conditions for admissibility which guarantee that (X(t)) remains with in the state 
space All these restriction are applied in bo th measures, P and Q, respectively. 


Filioovic 


2009|, Theorem 10.2): a, ocj are symmetric and positive semidef- 


Admissibility conditions (see[ 
inite. a// = Omxm, a/j = a/jj = Omxn, = 0 „xn for all j = m 1 ,... ,m n. ai^ki = ai,ik = 0 for 
k £ I \ {z} for all 1 < z, / < d, b' G f3'j j = Omxn and (3'jj has non-negative off- diagonal elements. 


n a m odel with diagonal diffusion matrix the admissibility restrictions are met if the 


Dai and Singleton 


20001 1 conditions presented in Definition [T] are met. To keep the process (X(t)) off the boundaries of the 


Ai't-Sahalia and Kimmell . 


state space we can impose the Boundary conditions/Feller conditions (see 
2010|, Eq. 15-17): 6 ) > for z = 1,..., m. (The conditions (3'jj = Omxn and f3jj having non-negative 
off-diagonal elements are already included in the admissibility conditions.) 

Last but not least, we have some further restrict i ons fo r stationarity: 

Stationarity conditions (see 


A'it-Sahalia and Kimmel 


2010|, Table 1): The real part of the eigenvalues of 0 


is smaller than zero. A more general treatment regarding stationarity is provided in 


Glasserman and Kim 
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(1201 m . 


F CMM-Estimation 


For our model it turned out that minimizing the GMM distance function (|23p is non-trivial. By using a 
standard minimization routine, as the MATLAB minimization routine fminsearch based on the Nelder- 


Mead algorithm 


26 


we observed that the estimation procedure preforms poorly 


27 


Therefore, as being 


described in Step 1 below, w e inclu de multistart random search methods in our minimization procedure 


(see, e.g., 


Torn and Zilinskasl . 


19891 ). Compared to working with the above minimization routine only. 


this procedure improves parameter estimation, especially when looking at the means and the absolute 
deviation from the mean in percentage terms. Some results are presented in Table [5j 

In addition, we apply cla s sical t ests such as the Wald and the distance difference test (see, e.g.. 


Ruud 


2nnnl : 


Newev and McFadden 


1994 1. We observe that these tests do not perform well. Some results for 


tests of the null hypothesis 9^ = 9^ against the alternative 9^ 9^ are presented in Table [6l where it 

can be seen that power and size of these tests do not fulfill “the usual quality standards”. We explain 
this behavior by the problem of estimating a relatively large (23 x 23) covariance matrix and a matrix of 
gradients with the Wald test (see also equation (l26t) l. Regarding the distance difference test, we observe 
that the (q x q) weighting matrix Ct = A~^ has a strong impact on the results of the tests, which in turn 
introduces potential inaccuracies in case A was not estimated accurately enough. 

To further improve the properties of th e estimation routine, we combi ne multistart random search 
methods with Quasi-Bayesian methods (see 


Chernozhukov and Hond . 


20031 1. To apply Bayesian tools a 


prior 7f('i?) has to be specified. The parameter space 0 is a subset of IRP. It is a proper subset, since some 
parameters are strictly positive, nonnegative, etc. by the model assumptions. In addition, admissibility 
and stationarity further restrict the parameter space. Hence, the prior •7r(^?) = 0 for all i? 0 0. In addition, 
to implement a random search method on a computer and to add “prior information” we restrict 0 to 
00 C 0, where 7f('i?) = 0 for all tD not contained in 0o. 


^®See http : //www.mathworks.de/de/help/matlab/ref/fminsearch.html 

^^Detailed results of these simulation experiments can be obtained from the authors on request. 
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The subset ©o is constructed as follows: For Sj the lower bound is set to 0.1, while the upper bound 
is set to 2. The upper bound follows from variances of the yields observed, the lower bound from the 
assumption that the variance of each component is not too small. For the unrestricted we assume 
that Bfj G [0,2], where Bfj > 0 follows from the models assumptions, while Bfj < 2 is used to keep the 
impact of the square root term on the other volatilities bounded. In addition, cr^ G [0.005,0.025]. This is 
motivated by the argument that the observation error is small compare d to the variance of the yields. The 


obser vation error can be due to market-microstructure noise (see, e.g.. 


Campbell et al 


19071 : 


Chen et al 


20071 1. The lower bound is based on the assumption that at least 10 basis points can be attributed to the 


noise. To ensure that the matrices (3^ and are sufficiently far away from a singular matrix, we assume 

< —0.1. To cope with the high degree of serial correlation of the yields, we demand for > —50. 
For Pij, i j we apply a lower bound of —10 and an upper bound of 10. The differences in the matrix 
exponential of /3 become small, when values outside these intervals are used. 

Since 9^ and 70 determine the mean of the instantaneous spot rate ]E(rt) = 70 + defined by a 
stationary (X(t)) (see equation ([2])), we assume that ^ [m'r(yi:T)]i < 7o + < c [mT(yi:T)]i) where 

c = 1.45 is applied in the Bayesian sampler. Since the sample mean of the instantaneous short rate 
cannot be observed, we use the sample mean of the shortest maturity, which in terms of our notation is 
[mT(yi:T)]i- 

In addition, the conditions on stationarity, identification and admissibility have to be met. Given 
these restrictions and the uniform prior on the components of ©o, the prior 7f(r9) is proportional to 
ii(Stationarity,Identification,Admissibility)ii^i[nij,(yj^. ’ where the term stands for an indi¬ 
cator function. Summing up, all the above restrictions result in the set ©q. For all elements 'd contained 
in ©0 we use a uniform prior and for all i? 0 ©0 we set Tt{'d) = 0. 

After the prior has been specified, parameter estimates are obtained as follows. 
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Step 1: Run multistart random search methods, generate where n = 1,..., N = 2,000. 
Step 2: Run MCMC: 

For each MCMC-step m, where m = 1,..., M = 20,000, 

update block-wise by means of the Metropolis-Hastings algorithm: 

AfCAfC Sub-Step 1: update block Ji 


MCMC Sub-Step K: update block Jk 

MCMC Sub-Step: reversible jump step (with a probability of 90%). 

Obtain an estimate 'd from the draws where m = Mf, + 1,..., M = 20,000. 


Ad Step 1: Given the set ©o, we randomly generate initial points 1 ?^“^ n = 1,..., N= 2,000, which are 
independently drawn by means of = ['d]j+c^[\'d\]jej for elements j, j G {1,... , p}, when the support 

is the real axis and log[|r9(%|]j = log[|r9|]j + c^ej such that = exp (log ([|i?|]j) + Ci 9 ej)sgn ([i?]j) for 

elements j from non-positive or non-negative part of the real axis. The random variables Sj are iid 
standard normal and only with it (t^W) > 

0 are used. In addition, as already stated in Section [H our 
random search routine also generates samples, where (0^)^°^ = (0^)^^\ This is done by setting (0^)^^^ 
equal to the sampled (0^3) with a probability of 80%. By sorting according to yi:r) 

in ascending order, we are equipped with the sorted draws i?[j] and distances Qr YIit) , where 
Qt {'d[iY,yi:T) < (i?[ 2 ];yi:r) < ••• < ('i?[N];yi:r)- The GMM distance function Qt i'd-,yi:T) is 

dehned in (f25|l , where Ct = Iq for all n = 1.N. _ 


Ad St ep 2: Based on the results in 


(see, e.g.. 


Robert and Casellal . 


Chernozhukov and Hond ([20031), the Metropolis-Hastings algorithm 


2004 1 can be used to minimize the CUE — CMM criterion function Qt{')- 


To do this we proceed as follows: Suppose that is available, where, just now, m stands for the index 

of the MCMC step. For m = 1 we start the Bayesian sampler at that is 19^°^= 
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The parameter vector to be updated, , is of dimension p, where the index set {1,..., p} is covered 

by the blocks Jk C {1,... ,p}, k = 1,...,K = 5. The first block Ji consists of the first two parameters and 
the 19th parameter, which is 70, J 2 = { 3 , ..., 9}, the third block J3 = {10,... , 15}, while JI4 = {16,17,18}. 
Finally, the fifth block J 5 contains the volatility parameters. For the parameter odering see first column 
of Table [TJ 

Within updating step m, we consider the sub-steps k = 1,..., K, where stands for the parameter 

vector in MCMC-step m at sub-step k. Let for k = 1 and 

for k = 2,... ,K,^ When the block is considered, j, i S Jk, is updated. To update 

i S Jk, a random walk proposal, with proposal density q | j)= , o-^ .) i® 

used, where f_\f[.){■) stands for a normal density. In the random walk proposals, we use small standard 
deviations of the noise in relative terms. In particular, aRWi = 0.01[|r9°*'^|]j, with a probability of 90%, for 
remaining 10% we set the standard deviation of this noise term equal to ajiwi = 0.005[|i9°*'^|]j. By apply¬ 
ing these proposals to all elements i £ Jk, we get the parameter vector and the proposal density 

q = riiGjfc ^ I remaining components [ 1 ?"'®"']^= where i is not 

contained in the block Sk- Equipped with '^yi-.r) and yi:T), the prior 7f(-) and the pro¬ 

posal densities g(-), the Metropolis-Bastings algorithm can be used. Let ^{'d) = exp \—^TQt (r?;yi:T)]- 
The GMM distance function (5r(^;yi:T) is defined in (l23]l . where Cr = with 

Ar h(t) ('*^^"'~^^;yi:r)h Then, a transition from to is 

accepted with probability 


Q (I?" 


= min < 1 


_^{^^new) q ^^old^^new^ 

^{'dold) iti^old) q{^new\^old-^ 


(75) 


To implement this Metropolis-Bastings step, we draw a [0,1] uniform random variable and accept i.e. 

.^(m,k) _ ^new^ j£ uniform random variable is smaller or equal to q (1?°*'^, ^ otherwise 

By our assumptions on the prior, it follows that 7 f(i?”®"') = 7 r(r 9 °*'^) as long as g 0 q. Whenever 

^new ^ 0 ^^ then the probability g equals to zero. Due the random walk proposal described above, we 
observe that q ^ (^^new\^^oid'^ ^ £g then updated such that becomes equal 

^®The index of the sub-step k is not applied, when it is not essential. 
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to the current After having performed these updating steps for all blocks, k = 1,... ,K, we obtain 


To improve the properties of the Bayesia n sa mpler in the case when 6^ = 6^ or when 9^ ^ 9^, a 


reversible jump move based on 


GreenI (jl995l l and 


Richardson and GreenI (jl997l l has been implemented. 


Suppose that has been obtained by the above steps. Let . With a probability of 90% we add 

the following step to sampling step m: Consider the state si, where 9^ = 9^ and state S 2 , where 9^ ^ 9^. 
The stat e S is Bernou lli distributed random variable with prior probability P(S' = si)= Psi = 0.90. By 


applying 


GreenI (|l995l i. transitions from {S' = si} to {S = S 2 } and vice versa can be performed by means 


of the Metropolis Hastings algorithm. In particular, consider the uniformly distributed random variable 
rj, as well as the normal iid random variables u and u^. The proposal densities are 
/aA(o,o -2 ){U'y). Let {S = si}, where 0°*'^ = 9^ = 9^. A possible split transition from {S = si} to {S = S 2 } 
works as follows 


QP,new ^ gold _ ^ 

gQ,new ^ gold ^ - g)u , 

^new ^ -2riu + u^. (76) 


By replacing the corresponding elements in by 9^’'^^'^, 9^ 
vector ^ Let = (si; 70 ^“^) and = ( 52 ;^^’”'^“', 


and we get the new parameter 

gQ,new^newy gy taking partial derivatives 


of the terms in (I76|) . we obtain the Jacobian matrix 

^®Note that 19°*'* contains the old parameters, where 6 ^ = 6 ^. In the notation of iGreer] (ll995l L the dimension of the 
parameter of interest with state si is ni = 2, (consisting of 0°*'^ and 7o*‘*), the dimension of the noise component is mi = 3 
(due to rj, u and u~^). With S2 we get 112 = 3 (consisting of ^ QQ,ne-w and m2 = 2 (due to rj and u-^). This 

yields, ni + mi = n 2 + m 2 . 
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d{e^,e^,r],-fo,u^y 

d{9,V,lo,u,Ujy 


^ 1 -2u 0 

1 -2u 0 

0 1 0 
0 -2u 1 

^0 0 0 


-2ri 0 ^ 

2(1-r?) 0 

0 0 

-2r] 1 

0 l) 


( 77 ) 


The determinant of the matrix J is equal to 2. Given the proposal densities q{u) = /At(o,cr 2 )(wl_aiid 


q(u-v) = /^(o,o -2 ){u^) for u and u^, a transition from “ to is accepted with probability (see 


Green], 


I 995 I . equation (7)) 


Q 





= min 


/ 1 - Ps, Mo.gg^) 

’ ^(^oZd) ^(^oid) /At(0,a2^) (^7"“') /AA(0,a2)(w) 


= min 




(78) 


Since u°^'^ = 7 q®"' — 7 g^'^ + 2r/u, the densities /v(o,o- 2 ) cancel out in (f78]l . An equivalent Metropolis-Hastings 
move can be performed without an update of 70 . A possible merge transition from {S = S 2 } to {S = si} 
works as follows 


Quev) _ QQ,new _ QP,new _ _j_ ,qQQpl<i 


new 

7o 


7 o*'^ — 2 r]u + , such that 

qP _ Qnew qQ _j_ Qnew 


U = 


— 2r] 


2 ( 1 - 7 ) ■ 


(79) 


By means of ([7^ we get and 7 q®"'. Then, a transition froi n = (g^; 7n^'^) to = 

(si; = ffnew^^new'^ jg accepted with probability (Green, 1995 . equation (7)): 
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Psi f / n 1 

JM{0,al)[U)- 


( 80 ) 


If either a split or a merge transition is accepted we set After a merge move = 9^ in 

updating sub-step k = 1, until a split move takes place. 

Parameter Estimation: To obtain the parameter estimates 'd, we consider the draws where 

m = M;, -I- 1,..., M of the convergent part of the Markov chain. We work with = 5,000 and M = 20,000. 
Then is provided by the sample mean. Tables [T] and [2] show parameter estimates obtained by using the 


Bayesian algorithm describee 
In addition, as shown by 


above. 


Chernozhukov and Hond (j2003l h the draws after burn-in phase can also be 


used to estimate the asymptotic variance of the parameters. To do this, we can simply calculate the 
sample variance of where m = -|- 1,..., M. To account for the serial correlation observed with the 

Markov chain, we follow Bayesian litera ture to estimate the var iance of the components of by means of 
the batch-means approach described in iFlegal and .lonesi ([20101) [ in particular. Equation (6) is used]. 


Monte Carlo Study: In the simulation studies described in Section Steps 1 and 2 are performed 
for each Monte-Carlo replication (1 = 1,..., L = 200). 


Remark 2. The implementation of the Quasi-Bayesian sampler based 


Chernozhukov and Hond (|2003l l is 


not “free of cost”. Running multistart random search methods and a standard minimization procedure 
and then performing the Wald test based on ([26p takes approximately 20 minutes, while one full estimation 
step based on running a random search and then obtaining 20,000 draws from a Markov Chain lasts for 
approximately 24 hours on the same standard PC. 
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-d 

mean 

d 

median 

min 

max 

std 

skew 

kurt 

- 1 ?'! 


10 

10.3593 

8.9022 

1.0527 

69.9629 

6.0544 

3.4352 

23.6299 

0.3593 

qp 

1.5 

1.5046 

1.2986 

0.0676 

6.4437 

1.0296 

1.3471 

5.1909 

0.0046 


-1 

- 1.2823 

- 1.0328 

- 7.6430 

- 0.1108 

0.9617 

- 2.0173 

9.4853 

0.2823 


0.2 

0.2523 

0.1729 

0.0099 

2.8282 

0.2549 

3.2707 

22.3280 

0.0523 

rQ 

P 31 

0.02 

0.0326 

0.0204 

0.0009 

0.5962 

0.0416 

5.2132 

49.7281 

0.0126 

rQ 

P 22 

-1 

- 1.5493 

- 1.4686 

- 4.2679 

- 0.1046 

0.7283 

- 0.5842 

3.2132 

0.5493 

q 

0.04 

0.0375 

0.0354 

- 0.0734 

0.1586 

0.0404 

0.1497 

2.8928 

0.0025 


0 

- 0.0005 

- 0.0002 

- 0.0343 

0.0303 

0.0097 

0.0013 

3.1280 

0.0005 


-1 

- 1.5042 

- 1.4266 

- 4.7165 

- 0.0664 

0.7906 

- 0.5289 

3.0549 

0.5042 


- 0.8 

- 1.6204 

- 0.8868 

- 43.8618 

- 0.0503 

2.9373 

- 8.3139 

99.3434 

0.8204 


0.02 

0.0330 

0.0210 

0.0013 

0.3927 

0.0378 

3.0352 

17.4456 

0.0130 


0.01 

0.0168 

0.0102 

0.0004 

0.2022 

0.0212 

4.1593 

27.5666 

0.0068 

/^22 

- 0.7 

- 0.9193 

- 0.8646 

- 3.1251 

0.2598 

0.5646 

- 0.5446 

2.8395 

0.2193 

Pf 

0.01 

0.0094 

0.0094 

- 0.0182 

0.0433 

0.0099 

- 0.5446 

2.8395 

0.0006 


0 

0.0000 

- 0.0003 

- 0.0316 

0.0305 

0.0099 

0.0824 

2.8274 

0.0000 


- 0.7 

- 0.9199 

- 0.8383 

- 3.0770 

0.2518 

0.5418 

- 0.5914 

3.0650 

0.2199 

iox 

^12 

0.05 

0.0791 

0.0496 

0.0029 

1.2802 

0.0964 

4.3127 

35.8452 

0.0291 

tDX 

^13 

0.1 

0.1590 

0.0978 

0.0025 

2.1414 

0.1969 

4.3398 

32.1066 

0.0590 

70 

2 

2.1224 

2.1483 

- 4.1375 

6.2455 

1.6726 

- 0.2254 

2.9938 

0.1224 

Si 

0.7 

0.5450 

0.4636 

0.0176 

2.7764 

0.3771 

1.6212 

7.1421 

0.1550 

S 2 

1 

1.0037 

0.7162 

0.0267 

5.8461 

0.8681 

1.9397 

7.7289 

0.0037 

S 3 

0.8 

0.8538 

0.6104 

0.0253 

8.2162 

0.8849 

3.6164 

22.8961 

0.0538 


0.0067 

0.0119 

0.0068 

0.0003 

0.2908 

0.0180 

7.8621 

102.5576 

0.0051 


Table 5: Parameter estimates for the Ai(3). Data simulated with M = 10 and T — 500. Estimation based on using 
fminsearch. = 1 is controlling for the noise in the generation of the starting value of the optimization routine. Statistics 
are obtained from 1, 000 simulation runs, mean, median, min, max, std, skew and kurt stand for the sample mean, median, 
minimum, maximum, standard deviation, skewness and kurtosis of the point estimates I = 000. Ii? — i?! stands 

for absolute value of the mean deviation from the true parameter. The true parameter values i? are reported in the second 
column. 
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9^ = 

10 ^ 1.5 = 9^ 

9<^ = 

9^ = 1.5 

as 

Wald 

DD 

Wald 

DD 

0.01 

0.018 

0.545 

0.015 

0.057 

0.05 

0.028 

0.583 

0.021 

0.062 

0.10 

0.043 

0.623 

0.025 

0.065 


Table 6: Parameter tests: Data are simulated with M = 10, T = 500 and = 1. = 6‘^ and [i ?]2 = 0^ and the 

remaining elements of i? are equal to those of the second column in Table [5] as stands for the signihcance level, controls 
for the noise in the generation of the starting value of the optimization routine. The null hypothesis is 9'^ = 9^ against 
the two sided alternative 9^ 7 ^ 9^. The parameters estimated by combining multistart random search methods and a 
standard minimization procedure. The Wald test as well as the distance difference test (DD) are implemented as described 
in Chapter 22 in iRuudI (l2000l') . Equation (I26II is used to estimate the asymptotic variance of y/T with the Wald test, 

while At, as presented in (I26II . is used with the distance difference test. The numbers in the table are rejection rates of the 
null hypothesis given the signihcance level as, when using a Wald test and a distance difference test. Statistics are obtained 
from 1,000 simulation runs. 
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